* changes in create optimizer to support tensor parallelism with SMP
* Update src/transformers/trainer.py
Convert if check to one line.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add doctest BERT
* make fixup
* fix typo
* change checkpoints
* make fixup
* define doctest output value, update doctest for mobilebert
* solve fix-copies
* update QA target start index and end index
* change checkpoint for docs and reuse defined variable
* Update src/transformers/models/bert/modeling_tf_bert.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* make fixup
* Add Doctest for Albert and Bigbird
* make fixup
* overwrite examples for Albert and Bigbird
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update longer examples for Bigbird
* using examples from squad_v2
* print out example text
* change name token-classification-big-bird checkpoint to random
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add tflops logging and fix grad accumulation
* add accelerate tracking and checkpointing
* scale loss of last batch correctly
* fix typo
* compress loss computation
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add resume from checkpoint argument
* add load_state accelerate from checkpoint, register lr scheduler and add tflops function
* reformat code
* reformat code
* add condition on path for resume checkpoint
* combine if conditions
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add source for tflops formula
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add gptj to TOKENIZER_MAPPING_NAMES
* fix int32 to float to avoid problem in onnx
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
- all activations should be fetched through ACT2FN
- it returns ReLU as `nn.Module`, which allows attaching hooks on the activation function and prints it to stdout when `print(model)`
* Adding support for `array` key in raw dictionnaries in ASR pipeline.
* ES .
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Making it work by not popping `array` first.
* Black 22.3
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Correct Logging of Eval metric to Tensorboard
An empty dictionary ``eval_metrics`` was being logged, is replaced by ``eval_metric`` which is the output dictionary of ``metric.compute()``.
* Remove unused variable
* Add doc about `attention_mask` on gpt2
Add a simple sentence describing how `attention_mask` needs to be constructed when ``past_key_values` is used.
* Add doc about attention_mask on gpt2_tf
* clean up style
* remove empty line white spaces
* remove whitespace in empty line
* Add first draft
* Improve README and run fixup
* Make script aligned with other scripts, improve README
* Improve script and add test
* Remove print statement
* Apply suggestions from code review
* Add num_labels to make test pass
* Improve README