Commit Graph

144 Commits

Author SHA1 Message Date
Sylvain Gugger
6f79d26442
Update quality tooling for formatting (#21480)
* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies
2023-02-06 18:10:56 -05:00
amyeroberts
e5db7051a8
Add TF image classification example script (#19956)
* TF image classification script

* Update requirements

* Fix up

* Add tests

* Update test fetcher
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix directory path

* Adding `zero-shot-object-detection` pipeline doctest. (#20274)

* Adding `zero-shot-object-detection` pipeline doctest.

* Remove nested_simplify.

* Add generate kwargs to `AutomaticSpeechRecognitionPipeline` (#20952)

* Add generate kwargs to AutomaticSpeechRecognitionPipeline

* Add test for generation kwargs

* Trigger CI

* Data collator returns np

* Update feature extractor -> image processor

* Bug fixes - updates to reflect changes in API

* Update flags to match PT & run faster

* Update instructions - Maria's comment

* Update examples/tensorflow/image-classification/README.md

* Remove slow decorator

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: bofeng huang <bofenghuang7@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2023-02-01 19:09:36 +00:00
Matt
071529bd54
Use return_tensors="np" instead of "tf" (#21266)
Return NP instead of TF tensors for our data loading pipeline
2023-01-24 13:37:49 +00:00
Sylvain Gugger
7119bb052a
v4.27.0.dev0 2023-01-23 16:52:35 -05:00
Roy Hvaara
35a7052b61
[NumPy] Remove references to deprecated NumPy type aliases (#21022)
[NumPy] Remove references to deprecated NumPy type aliases.

This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).

NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.

Co-authored-by: Peter Hawkins <phawkins@google.com>

Co-authored-by: Peter Hawkins <phawkins@google.com>
2023-01-05 13:02:10 -05:00
Sylvain Gugger
60d1f31bb0
v4.26.0.dev0 2022-12-01 16:19:33 -05:00
Sylvain Gugger
a3f7458066
Pin to the right version... 2022-11-18 07:12:55 -05:00
Sylvain Gugger
06886d5a68
Only resize embeddings when necessary (#20043)
* Only resize embeddings when necessary

* Add comment
2022-11-03 12:05:04 -04:00
Sylvain Gugger
c3a93d8d82
v4.25.0.dev0 2022-10-31 21:48:40 -04:00
Lysandre
10100979ed Dev version 2022-10-10 17:25:40 -04:00
Matt
83dc6377d0
Reduce LR for TF MLM example test (#19156) 2022-09-22 08:51:27 -04:00
Lysandre
16913b3c92 Dev version 2022-09-14 14:58:20 -04:00
Matt
6eb51450fa
TF Examples Rewrite (#18451)
* Finished QA example

* Dodge a merge conflict

* Update text classification and LM examples

* Update NER example

* New Keras metrics WIP, fix NER example

* Update NER example

* Update MC, summarization and translation examples

* Add XLA warnings when shapes are variable

* Make sure batch_size is consistently scaled by num_replicas

* Add PushToHubCallback to all models

* Add docs links for KerasMetricCallback

* Add docs links for prepare_tf_dataset and jit_compile

* Correct inferred model names

* Don't assume the dataset has 'lang'

* Don't assume the dataset has 'lang'

* Write metrics in text classification

* Add 'framework' to TrainingArguments and TFTrainingArguments

* Export metrics in all examples and add tests

* Fix training args for Flax

* Update command line args for translation test

* make fixup

* Fix accidentally running other tests in fp16

* Remove do_train/do_eval from run_clm.py

* Remove do_train/do_eval from run_mlm.py

* Add tensorflow tests to circleci

* Fix circleci

* Update examples/tensorflow/language-modeling/run_mlm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/test_tensorflow_examples.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/translation/run_translation.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/token-classification/run_ner.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix save path for tests

* Fix some model card kwargs

* Explain the magical -1000

* Actually enable tests this time

* Skip text classification PR until we fix shape inference

* make fixup

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2022-08-10 16:49:51 +01:00
Julien Chaumond
9129fd0377
transformers-cli login => huggingface-cli login (#18490)
* zero chance anyone's using that constant no?

* `transformers-cli login` => `huggingface-cli login`

* `transformers-cli repo create` => `huggingface-cli repo create`

* `make style`
2022-08-06 09:42:55 +02:00
Julien Chaumond
8d1f9039d0
Just re-reading the whole doc every couple of months 😬 (#18489)
* Delete valohai.yaml

* NLP => ML

* typo

* website supports https

* datasets

* 60k + modalities

* unrelated link fixing for accelerate

* Ok those links were actually broken

* Fix link

* Make `AutoTokenizer` auto-link

* wording tweak

* add at least one non-nlp task
2022-08-06 09:38:55 +02:00
Sylvain Gugger
941d233153
Fix ROUGE add example check and update README (#18398)
* Fix ROUGE add example check and update README

* Stay consistent in values
2022-08-01 11:14:49 -04:00
Sylvain Gugger
986526a0e4
Replace as_target context managers by direct calls (#18325)
* Preliminary work on tokenizers

* Quality + fix tests

* Treat processors

* Fix pad

* Remove all uses of  in tests, docs and examples

* Replace all as_target_tokenizer

* Fix tests

* Fix quality

* Update examples/flax/image-captioning/run_image_captioning_flax.py

Co-authored-by: amyeroberts <amy@huggingface.co>

* Style

Co-authored-by: amyeroberts <amy@huggingface.co>
2022-07-29 08:09:09 -04:00
Vijay S Kalmath
a2586795e5
Migrate metric to Evaluate library for tensorflow examples (#18327)
* Migrate metric to Evaluate library in tf examples

Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.

Fix for #18306

* Migrate metric to Evaluate library in tf examples

Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.

Fix for #18306

* Migrate `metric` to Evaluate for all tf examples

Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
2022-07-28 14:24:27 -04:00
Lysandre
c89a592e87 Dev version 2022-07-27 17:13:57 +02:00
John Giorgi
fde22c75a1
Add summarization name mapping for MultiNews (#18117)
* Add summarization name mapping for MultiNews

* Add summarization name mapping for MultiNews
2022-07-13 08:19:20 -04:00
Sylvain Gugger
7c6ec195ad v4.21.0.dev0 2022-06-16 12:20:53 -04:00
Sylvain Gugger
3cab90279f
Add examples telemetry (#17552)
* Add examples telemetry

* Alternative approach

* Add to all other examples

* Add to templates as well

* Put framework separately

* Same for TensorFlow
2022-06-07 11:57:52 -04:00
Sylvain Gugger
afe5d42d8d
Black preview (#17217)
* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black
2022-05-12 16:25:55 -04:00
Lysandre Debut
5294fa12ee Dev version 2022-05-12 11:04:23 -04:00
Leonid Boytsov
c82e017aa9
Misc. fixes for Pytorch QA examples: (#16958)
1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed).
2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion).
3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions.
4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
2022-04-27 08:51:39 -04:00
Wonjae Kim
b74a955325
fix rum_clm.py seeking text column name twice (#16624) 2022-04-19 14:38:25 +01:00
Lysandre Debut
a180efe7fd Dev version 2022-04-06 11:08:12 -04:00
Karim Foda
24a85cca61
Add use_auth to load_datasets for private datasets to PT and TF examples (#16521)
* fix formatting and remove use_auth

* Add use_auth_token to Flax examples
2022-04-04 10:27:45 -04:00
Stas Bekman
a73281e3e4
[examples] max samples can't be bigger than the len of dataset (#16501)
* [examples] max samples can't be bigger than then len of dataset

* do tf and flax
2022-03-30 12:33:16 -07:00
Sylvain Gugger
088c1880b7
Big file_utils cleanup (#16396)
* Big file_utils cleanup

* This one still needs to be treated separately
2022-03-25 07:25:20 -04:00
Sylvain Gugger
4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Lysandre Debut
eca77f4719
Updates the default branch from master to main (#16326)
* Updates the default branch from master to main

* Links from `master` to `main`

* Typo

* Update examples/flax/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-23 03:46:59 -04:00
Joao Gante
e7f34ccd4f
Swag example: Update doc format (#16014) 2022-03-09 13:25:34 +00:00
Joao Gante
62d847602a
Update TF multiple choice example (#15868) 2022-03-08 13:16:34 +00:00
Sylvain Gugger
79d28e80b6 v4.18.0.dev.0 2022-03-03 10:19:58 -05:00
Joao Gante
05c237ea94
Update TF QA example (#15870) 2022-03-02 10:38:13 +00:00
Joao Gante
3f2e636850
Update TF LM examples (#15855) 2022-03-01 14:12:58 +00:00
Joao Gante
3956b133b6
TF text classification examples (#15704)
* Working example with to_tf_dataset

* updated text_classification

* more comments
2022-02-21 17:17:59 +00:00
Sylvain Gugger
d0b5ed110a
Harder check for IndexErrors in QA scripts (#15438)
* Harder check for IndexErrors in QA scripts

* Make test stronger
2022-02-01 15:49:13 -05:00
Lysandre
eab338104d Docs for version v4.16.0 2022-01-27 13:11:51 -05:00
Lysandre
f87db5e412 Release: v4.16.0 2022-01-27 13:06:33 -05:00
Russell Klopfer
27b819b0e3
use block_size instead of max_seq_length in tf run_clm example (#15036)
* use block_size instead of max_seq_length

* fixup

* remove pad_to_block_size

Co-authored-by: Russell Klopfer <russell@kloper.us>
2022-01-12 08:57:00 -05:00
Patrick von Platen
fa39ff9fc4 Docs for v4.16.0dev0 2021-12-22 20:39:44 +01:00
Patrick von Platen
05fa1a7ac1 Release: v4.15.0 2021-12-22 18:43:15 +01:00
Lysandre
7c9c41f43c Docs for v4.14.0 2021-12-15 18:29:53 +01:00
Lysandre
960d8cb41d Release: v4.14.0 2021-12-15 18:20:35 +01:00
Lysandre
ab31b3e41b Docs for v4.14.0dev0 2021-12-09 17:09:23 +01:00
Lysandre
4da3a696e4 Release: v4.13.0 2021-12-09 16:55:21 +01:00
Julien Chaumond
6cdc3a7844
[urls to hub] Replace outdated model tags with their now-canonical pipeline types (#14617)
* Replace outdated model tags with their now-canonical pipeline types

* spam the CI till it's green
2021-12-06 04:35:01 -05:00
Nicholas Broad
69e16abf98
Switch from using sum for flattening lists of lists in group_texts (#14472)
* remove sum for list flattening

* change to chain(*)

* make chain object a list

* delete empty lines

per sgugger's suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-11-22 16:17:26 -05:00
Matt
267867e851
Quick fix to TF summarization example (#14401) 2021-11-15 13:45:51 +00:00
Matt
7f20bf0d43
Fixing requirements for TF LM models and use correct model mappings (#14372)
* Fixing requirements for TF LM models and use correct model mappings

* make style
2021-11-11 15:34:00 +00:00
Lysandre
b8fad022a0 v4.13.0.dev0 2021-10-28 12:56:46 -04:00
Lysandre
62bf536631 Release v4.12.0 2021-10-28 12:09:49 -04:00
Christopher Akiki
f9c16b02e3
Replace "Masked" with "Causal" in TF CLM example (#14014) 2021-10-21 16:19:30 +01:00
Dhananjay Shettigar
319beb64eb
#12789 Replace assert statements with exceptions (#13909)
* #12789 Replace assert statements with exceptions

* fix-copies: made copy changes to utils_qa.py in examples/pytorch/question-answering and examples/tensorflow/question-answering

* minor refactor for clarity
2021-10-07 09:09:01 -04:00
Lysandre
11c69b8045 Docs for version v4.11.0 2021-09-27 14:19:38 -04:00
Lysandre
dc193c906d Release: v4.11.0 2021-09-27 14:14:09 -04:00
Lysandre
5ee67a4412 Docs for v4.10.0 2021-08-31 16:02:31 +02:00
Lysandre
d12bbe4942 Release: v4.10.0 2021-08-31 15:53:10 +02:00
Matt
702f4a49cd
Fixed CLM model still using MODEL_FOR_MASKED_LM_MAPPING (#13002) 2021-08-31 13:21:39 +01:00
Sylvain Gugger
139e830158
Update label2id in the model config for run_glue (#13334) 2021-08-30 10:35:09 -04:00
Stefan Schweter
4046e66e40
examples: only use keep_linebreaks when reading TXT files (#13320)
* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples
2021-08-28 16:22:29 +02:00
Stefan Schweter
319d840b46
examples: add keep_linebreaks option to CLM examples (#13150)
* examples: add keep_linebreaks option to text dataset loader for all CLM examples

* examples: introduce new keep_linebreaks option as data argument in CLM examples
2021-08-27 11:35:45 +02:00
Sylvain Gugger
3ec851dc5e
Fix QA examples for roberta tokenizer (#12928) 2021-07-28 09:47:49 -04:00
Elysium1436
f3d0866ed9
Correct validation_split_percentage argument from int (ex:5) to float (0.05) (#12897)
* Fixed train_test_split test_size argument

* `Seq2SeqTrainer` set max_length and num_beams only when non None  (#12899)

* set max_length and num_beams only when non None

* fix instance variables

* fix code style

* [FLAX] Minor fixes in CLM example (#12914)

* readme: fix retrieval of vocab size for flax clm example

* examples: fix flax clm example when using training/evaluation files

* Fix module path for symbolic_trace example

Co-authored-by: cchen-dialpad <47165889+cchen-dialpad@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-07-27 21:01:40 -04:00
Matt
569f61a760
Add TF multiple choice example (#12865)
* Add new multiple-choice example, remove old one
2021-07-26 15:15:51 +01:00
Lysandre
40de2d5a4f Docs for v4.10.0dev0 2021-07-22 12:52:25 +02:00
Lysandre
72aee83ced Release: v4.9.0 2021-07-22 12:11:55 +02:00
Matt
f9ac677eba
Update TF examples README (#12703)
* Update Transformers README, rename token_classification example to token-classification to be consistent with the others

* Update examples/tensorflow/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add README for TF token classification

* Update examples/tensorflow/token-classification/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/tensorflow/token-classification/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-14 15:15:25 +01:00
Matt
65bf05cd18
Adding TF translation example (#12667)
* Adding TF translation example

* Fixes and style pass for TF translation example

* Remove unused postprocess_text copied from run_summarization

* Adding README

* Review fixes

* Move changes to model.config to after we've initialized the model
2021-07-13 19:08:25 +01:00
Matt
379f649434
TF summarization example (#12617)
* Adding a TF summarization example

* Style pass

* Style fixes

* Updates for review comments

* Adding README

* Style pass

* Remove unused import
2021-07-12 15:58:38 +01:00
Sylvain Gugger
6f1adc4334
Fix group_lengths for short datasets (#12558) 2021-07-08 07:23:41 -04:00
Matt
ea55675024
NER example for Tensorflow (#12469)
* NER example for Tensorflow

* Style pass

* Style pass

* Added metric computation on the evaluation set

* Style pass

* Fixed label masking

* Style pass

* Style pass
2021-07-05 15:42:18 +01:00
Souvic Chakraborty
d5b8fe3b90
Validation split added: custom data files @sgugger, @patil-suraj (#12407)
* Validation split added: custom data files

Validation split added in case of no validation file and loading custom data

* Updated documentation with custom file usage

Updated documentation with custom file usage

* Update README.md

* Update README.md

* Update README.md

* Made some suggested stylistic changes

* Used logger instead of print.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Made similar changes to add validation split

In case of a missing validation file, a validation split will be used now.

* max_train_samples to be used for training only

max_train_samples got misplaced, now corrected so that it is applied on training data only, not whole data.

* styled

* changed ordering

* Improved language of documentation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Improved language of documentation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fixed styling issue

* Update run_mlm.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-01 13:22:42 -04:00
Matt
7e22609e0f
Tensorflow LM examples (#12358)
* Tensorflow MLM example

* Add CLM example

* Style fixes, adding missing checkpoint code from the CLM example

* Fix TPU training, avoid massive dataset warnings

* Fix incorrect training length calculation for multi-GPU training

* Fix incorrect training length calculation for multi-GPU training

* Refactors and nitpicks from the review

* Style pass

* Adding README
2021-06-28 19:31:44 +01:00
Sylvain Gugger
276bc149d2 Fix copies 2021-06-28 12:26:40 -04:00
Sylvain Gugger
57461ac0b4
Add possibility to maintain full copies of files (#12312) 2021-06-28 10:02:53 -04:00
Stas Bekman
4a872caef4
remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Sylvain Gugger
2150dfed31 v4.9.0.dev0 2021-06-23 13:31:19 -04:00
Sylvain Gugger
9252a5127f Release: v4.8.0 2021-06-23 13:25:56 -04:00
Matt
e3cb7a0b60
Tensorflow QA example (#12252)
* New Tensorflow QA example!

* Style pass

* Updating README.md for the new example

* flake8 fixes

* Update examples/tensorflow/question-answering/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-21 16:37:28 +01:00
Lysandre
0daadc1919 Docs for v4.8.0 2021-06-17 18:17:42 +02:00
Lysandre
7a6c9fab8e Release: v4.7.0 2021-06-17 17:57:42 +02:00
Matt
4cda08decb
Minor style edits 2021-06-10 15:10:57 +01:00
Matt
7f08dbd10a
Update README.md to cover the TF GLUE example. 2021-06-10 14:33:42 +01:00
Matt
73a532651a
New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example

* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.

* Fix to the fit() call

* Bugfixes, making sure TPU and multi-GPU support is ready

* Remove logger line that depends on Pytorch

* Style pass

* Deleting old TF GLUE example

* Include label2id and id2label in the saved model config

* Don't clobber the existing model.config.label2id

* Style fixes

* Update examples/tensorflow/text-classification/run_glue.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-10 14:14:37 +01:00
Matt
ef8d32c5ea
Fix suggested by @bhadreshpsavani (#11660) 2021-05-10 14:28:04 +01:00
Matt
20d6931e32
Update TF text classification example (#11496)
Big refactor, fixes and multi-GPU/TPU support
2021-04-30 13:45:33 +01:00
Bhadresh Savani
1d30ec95c7
[Examples] Fixes inconsistency around eval vs val and predict vs test (#11380)
* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Matt
2617396094
Correctly cast num_train_epochs to int (#11379) 2021-04-22 13:49:59 +01:00
Matt
6fe79e57d7
Move old TF text classification script to legacy (#11361)
And update README to explain the work-in-progress!
2021-04-21 17:36:18 +01:00
Matt
ac588594e2
Merge new TF example script (#11360)
First of the new and more idiomatic TF examples!
2021-04-21 17:04:55 +01:00
Sylvain Gugger
dabeb15292
Examples reorg (#11350)
* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00