Sam Shleifer
08b59d10e5
MBartTokenizer:add language codes ( #3776 )
2020-06-11 13:02:33 -04:00
Sylvain Gugger
f1fe18465d
Use labels to remove deprecation warnings ( #4807 )
2020-06-05 16:41:46 -04:00
Julien Chaumond
b42586ea56
Fix CI after killing archive maps ( #4724 )
...
* 🐛 Fix model ids for BART and Flaubert
2020-06-02 10:21:09 -04:00
Julien Chaumond
d4c2cb402d
Kill model archive maps ( #4636 )
...
* Kill model archive maps
* Fixup
* Also kill model_archive_map for MaskedBertPreTrainedModel
* Unhook config_archive_map
* Tokenizers: align with model id changes
* make style && make quality
* Fix CI
2020-06-02 09:39:33 -04:00
Sam Shleifer
b86e42e0ac
[ci] fix 3 remaining slow GPU failures ( #4584 )
2020-05-25 19:20:50 -04:00
Sam Shleifer
956c4c4eb4
[gpu slow tests] fix mbart-large-enro gpu tests ( #4472 )
2020-05-19 19:45:31 -04:00
Julien Chaumond
4bf5042240
Fix BART tests on GPU ( #4298 )
2020-05-12 09:11:50 -04:00
Sam Shleifer
18db92dd9a
[testing] add timeout_decorator ( #3543 )
2020-05-01 09:05:47 -04:00
Julien Chaumond
f54dc3f4d5
[ci] Load pretrained models into the default (long-lived) cache
...
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models
When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.
I'd rather always use the default cache
2020-04-30 22:30:15 -04:00
Sam Shleifer
847e7f3379
MarianMTModel.from_pretrained('Helsinki-NLP/opus-marian-en-de') ( #3908 )
...
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
2020-04-28 18:22:37 -04:00
Sam Shleifer
7a7fdf71f8
Multilingual BART - ( #3602 )
...
- support mbart-en-ro weights
- add MBartTokenizer
2020-04-10 11:25:39 -04:00
Sam Shleifer
715aa5b135
[Bart] Replace config.output_past with use_cache kwarg ( #3632 )
2020-04-07 19:08:26 -04:00
Sam Shleifer
8deff3acf2
[bart-tiny-random] Put a 5MB model on S3 to allow faster exampl… ( #3488 )
2020-03-30 12:28:27 -04:00
Patrick von Platen
75ec6c9e3a
[T5] make decoder input ids optional for t5 training ( #3521 )
...
* make decoder input ids optional for t5 training
* lm_lables should not be shifted in t5
* add tests
* finish shift right functionality for PT T5
* move shift right to correct class
* cleaner code
* replace -100 values with pad token id
* add assert statement
* remove unnecessary for loop
* make style
2020-03-30 13:45:26 +02:00
Sam Shleifer
f6a23d1911
[BART] add bart-large-xsum weights ( #3422 )
2020-03-29 10:51:13 -04:00
Sam Shleifer
3ee431dd4c
[Bart/Memory] Two separate, smaller decoder attention masks ( #3371 )
2020-03-26 21:34:15 -04:00
Sam Shleifer
39371ee454
[Bart/Memory] don't create lm_head ( #3323 )
...
* delete lm_head, skips weight tying
* Fixed s3
2020-03-26 18:40:39 -04:00
Patrick von Platen
95e00d0808
Clean special token init in modeling_....py ( #3264 )
...
* make style
* fix conflicts
2020-03-20 21:41:04 +01:00
Patrick von Platen
bbf26c4e61
Support T5 Generation ( #3228 )
...
* fix conflicts
* update bart max length test
* correct spelling mistakes
* implemented model specific encode function
* fix merge conflicts
* better naming
* save intermediate state -> need to rethink strucuture a bit
* leave tf problem as it is for now
* current version
* add layers.pop
* remove ipdb
* make style
* clean return cut decoding
* remove ipdbs
* Fix restoring layers in the decoders that doesnt exists.
* push good intermediate solution for now
* fix conflicts
* always good to refuse to merge conflicts when rebasing
* fix small bug
* improve function calls
* remove unused file
* add correct scope behavior for t5_generate
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-03-19 23:18:23 +01:00
Sam Shleifer
ad7233fc01
[BART] cleanup: remove redundant kwargs, improve docstrings ( #3319 )
2020-03-19 11:16:51 -04:00
Patrick von Platen
e8f44af5bf
[generate] do_sample default back to False ( #3298 )
...
* change do_samples back
* None better default as boolean
* adapt do_sample to True in test example
* make style
2020-03-17 10:52:37 -04:00
Sam Shleifer
b2c1a447fe
[BART] Delete redundant unit test ( #3302 )
2020-03-16 23:09:10 -04:00
Sam Shleifer
5ea8ba67b4
[BART] Remove unused kwargs ( #3279 )
...
* Remove unused kwargs
* dont call forward in tests
2020-03-15 23:00:44 -04:00
Thomas Wolf
3814e167d9
Merge pull request #3225 from patrickvonplaten/finalize_merge_bart_generate_into_default_generate
...
Complete merge Seq-2-Seq generation into default generation
2020-03-14 15:08:59 +01:00
Sam Shleifer
2bd79e23de
[BART] FP16 testing fixes ( #3266 )
2020-03-13 19:48:26 -04:00
Patrick von Platen
6a82f774f2
fix typo
2020-03-12 21:10:51 +01:00
Patrick von Platen
f1c71da115
fix eos_token_ids in test
2020-03-12 21:00:54 +01:00
Patrick von Platen
6047f46b19
re-add eos token to get good bart results
2020-03-12 20:17:50 +01:00
Patrick von Platen
ac303eae46
fix problem with half
2020-03-11 12:24:30 +01:00
Patrick von Platen
bc9d5d917c
make all tensors half precision
2020-03-11 12:15:38 +01:00
Patrick von Platen
a332cc9f7f
finalize generation merge
2020-03-11 11:53:36 +01:00
Patrick von Platen
7351a8dbaf
re-add scoring filtering
2020-03-11 11:06:56 +01:00
Patrick von Platen
374deef48d
fixed typo
2020-03-11 11:06:56 +01:00
patrickvonplaten
41b437ea3a
add draft version of propsoed changes for ROGUE score
2020-03-11 11:06:56 +01:00
patrickvonplaten
a5751f7578
fix bug with attention_mask as optional input argument
2020-03-11 11:06:56 +01:00
patrickvonplaten
d880a5fbde
finalized PR
2020-03-11 11:06:56 +01:00
patrickvonplaten
2acfe63964
best current version and make style
2020-03-11 11:06:56 +01:00
patrickvonplaten
c62444da39
fix conflicts
2020-03-11 11:06:56 +01:00
Patrick von Platen
77e6775065
add current changes
2020-03-11 11:06:56 +01:00
Patrick von Platen
421216997b
comment out stuff
2020-03-11 11:06:56 +01:00
Patrick von Platen
7a11e925cf
work in progress
2020-03-11 11:06:56 +01:00
Patrick von Platen
aceb3fbaf4
only do output_past=True for language generation in bart
2020-03-11 11:06:56 +01:00
Patrick von Platen
7cba11fb9b
better naming
2020-03-11 11:06:56 +01:00
Patrick von Platen
ff648221bd
fix conflicts
2020-03-11 11:06:56 +01:00
Patrick von Platen
c0d9dd3ba9
refactored code a bit and made more generic
2020-03-11 11:06:56 +01:00
Patrick von Platen
d8e2b3c547
fix conflicts
2020-03-11 11:06:56 +01:00
Sam Shleifer
ed37f9fa4f
[Bart] _prepare_decoder_inputs should use large negative ( #3158 )
2020-03-06 16:06:36 -05:00
patrickvonplaten
58fc8f97a3
fix renaming problem
2020-03-06 00:35:47 +01:00
Sam Shleifer
857e0a0d3b
Rename BartForMaskedLM -> BartForConditionalGeneration ( #3114 )
...
* improved documentation
2020-03-05 17:41:18 -05:00
sshleifer
1360dacaa3
cleanup deltas
2020-03-05 12:57:42 -05:00
sshleifer
c36fdc88d4
tests pass
2020-03-05 12:33:08 -05:00
Sam Shleifer
e9e6efdc45
BartForSequenceClassification: fix num_labels, add test ( #3110 )
2020-03-03 15:54:29 -05:00
Sam Shleifer
b54ef78d0c
Bart-CNN ( #3059 )
...
`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
2020-03-02 10:35:53 -05:00
Julien Chaumond
f5516805c2
Fix bart slow test
2020-02-26 20:47:49 +00:00
Julien Chaumond
9cda3620b6
Fix (non-slow) tests on GPU (torch) ( #3024 )
...
* Fix tests on GPU (torch)
* Fix bart slow tests
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-02-26 11:59:25 -05:00
Sam Shleifer
92487a1dc0
Bart: fix layerdrop and cached decoder_input_ids for generation ( #2969 )
2020-02-22 16:25:04 -05:00
Sam Shleifer
53ce3854a1
New BartModel ( #2745 )
...
* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs
2020-02-20 18:11:13 -05:00