transformers/requirements.txt at 3f9f749325646efd46c16beafa901b8fdf89b89c - transformers - DGHS Git Repository

riaz.somc/transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 13:50:13 +06:00

Sayak Paul 390e121fb5

[Examples] TPU-based training of a language model using TensorFlow (#21657 )

* add: tokenizer training script for TF TPU LM training.

* add: script for preparing the TFRecord shards.

* add: sequence of execution to readme.

* remove limit from the tfrecord shard name.

* Add initial train_model.py

* Add basic training arguments and model init

* Get up to the point of writing the data collator

* Pushing progress so far!

* Complete first draft of model training code

* feat: grouping of texts efficiently.

Co-authored-by: Matt <rocketknight1@gmail.com>

* Add proper masking collator and get training loop working

* fix: things.

* Read sample counts from filenames

* Read sample counts from filenames

* Draft README

* Improve TPU warning

* Use distribute instead of distribute.experimental

* Apply suggestions from code review

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Modularize loading and add MLM probability as arg

* minor refactoring to better use the cli args.

* readme fillup.

* include tpu and inference sections in the readme.

* table of contents.

* parallelize maps.

* polish readme.

* change script name to run_mlm.py

* address PR feedback (round I).

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

2023-04-14 10:41:01 +05:30

4 lines

56 B

Plaintext

Raw Blame History

	`transformers==4.26.1`
	`datasets==2.9.0`
	`tokenizers==0.13.2`