transformers

riaz.somc/transformers

Fork 0

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 05:10:06 +06:00

Commit Graph

Author	SHA1	Message	Date
Klaus Hipp	721ee783ca	[Docs] Fix spelling and grammar mistakes (#28825 ) * Fix typos and grammar mistakes in docs and examples * Fix typos in docstrings and comments * Fix spelling of `tokenizer` in model tests * Remove erroneous spaces in decorators * Remove extra spaces in Markdown link texts	2024-02-02 08:45:00 +01:00
Sayak Paul	4116d1ec75	[Examples/TensorFlow] minor refactoring to allow compatible datasets to work (#22879 ) minor refactoring to allow compatible datasets to work.	2023-04-20 18:21:01 +05:30
Sayak Paul	390e121fb5	[Examples] TPU-based training of a language model using TensorFlow (#21657 ) * add: tokenizer training script for TF TPU LM training. * add: script for preparing the TFRecord shards. * add: sequence of execution to readme. * remove limit from the tfrecord shard name. * Add initial train_model.py * Add basic training arguments and model init * Get up to the point of writing the data collator * Pushing progress so far! * Complete first draft of model training code * feat: grouping of texts efficiently. Co-authored-by: Matt <rocketknight1@gmail.com> * Add proper masking collator and get training loop working * fix: things. * Read sample counts from filenames * Read sample counts from filenames * Draft README * Improve TPU warning * Use distribute instead of distribute.experimental * Apply suggestions from code review Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Modularize loading and add MLM probability as arg * minor refactoring to better use the cli args. * readme fillup. * include tpu and inference sections in the readme. * table of contents. * parallelize maps. * polish readme. * change script name to run_mlm.py * address PR feedback (round I). --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2023-04-14 10:41:01 +05:30

Author

SHA1

Message

Date

Klaus Hipp

721ee783ca

[Docs] Fix spelling and grammar mistakes (#28825 )

* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts

2024-02-02 08:45:00 +01:00

Sayak Paul

4116d1ec75

[Examples/TensorFlow] minor refactoring to allow compatible datasets to work (#22879 )

minor refactoring to allow compatible datasets to work.

2023-04-20 18:21:01 +05:30

Sayak Paul

390e121fb5

[Examples] TPU-based training of a language model using TensorFlow (#21657 )

* add: tokenizer training script for TF TPU LM training.

* add: script for preparing the TFRecord shards.

* add: sequence of execution to readme.

* remove limit from the tfrecord shard name.

* Add initial train_model.py

* Add basic training arguments and model init

* Get up to the point of writing the data collator

* Pushing progress so far!

* Complete first draft of model training code

* feat: grouping of texts efficiently.

Co-authored-by: Matt <rocketknight1@gmail.com>

* Add proper masking collator and get training loop working

* fix: things.

* Read sample counts from filenames

* Read sample counts from filenames

* Draft README

* Improve TPU warning

* Use distribute instead of distribute.experimental

* Apply suggestions from code review

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Modularize loading and add MLM probability as arg

* minor refactoring to better use the cli args.

* readme fillup.

* include tpu and inference sections in the readme.

* table of contents.

* parallelize maps.

* polish readme.

* change script name to run_mlm.py

* address PR feedback (round I).

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

2023-04-14 10:41:01 +05:30

3 Commits