transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 13:20:12 +06:00

Author	SHA1	Message	Date
Patrick von Platen	961732c276	[Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding (#14339 ) * up * up * up * make it cleaner * correct * make styhahalal * add more tests * finish * small fix * make style * up * tryout to solve cicrle ci * up * fix more tests * fix more tests * apply sylvains suggestions * fix import * correct docs * add pyctcdecode only to speech tests * fix more tests * add tf, flax and pt tests * add pt * fix last tests * fix more tests * Apply suggestions from code review * change lines * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * correct tests * correct tests * add doc string Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>	2021-12-08 12:07:54 +01:00
Nik	6645eb61fa	fix #14524 (IndexError when mask prob is too low) (#14525 ) * fix #14524 (IndexError when mask prob is too low) * fix formatting * correct documentation, add option for setting min_num_masks * change the semantic meaning of `mask_prob` in _compute_mask_indices With this commit the meaing of `mask_prob` actually adhered to the probability for each vector to be the start of a masked span of length. * fix check_copies test * fix documentation to semantic meaning of `upper bound of overall masking percentage`, revert changes to _compute_mask_indices * fix typo	2021-12-02 17:05:31 +03:00
Patrick von Platen	700a748fe6	[Wav2Vec2] Add New Wav2Vec2 Translation (#14392 ) * add new wav2vec2 translation * correct * up * add tests * correct end copy * correct more * up * correct unispeech sat * finish * finalize * finish * up	2021-11-17 14:38:56 +01:00
Sylvain Gugger	040fd47162	Fix gradient_checkpointing backward compatibility (#14408 ) * Fix gradient_checkpointing backward compatibility * Remove needless line * make sure mask prob is big enough and length small enough * Fix tests Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2021-11-16 08:58:42 -05:00
Patrick von Platen	0c3174c758	Add TF<>PT and Flax<>PT everywhere (#14047 ) * up * up * up * up * up * up * up * add clip * fix clip PyTorch * fix clip PyTorch * up * up * up * up * up * up * up	2021-10-25 23:55:08 +02:00
Patrick von Platen	bdf31d6e0a	[Speech] Move all examples to new audio feature (#14045 ) * up * up * up * finish	2021-10-18 12:52:40 +02:00
Patrick von Platen	7fb2a8b3d9	up (#14008 )	2021-10-14 15:46:22 +02:00
Patrick von Platen	58bf882579	[Wav2Vec2] Make sure tensors are always bool for mask_indices (#13977 ) * correct long to bool * up * correct code	2021-10-12 18:17:06 +02:00
Patrick von Platen	d45fc7da3d	[Speech Examples] Add pytorch speech pretraining (#13877 ) * adapt wav2vec2 * add example * add files * adapt * remove bogus file * Apply suggestions from code review * adapt files more * upload changes * del old files * up * up * up * up * up * correct gradient checkpoitning * add readme * finish * finish * up * more fixes * up * up * add demo run to readme * up	2021-10-12 00:46:32 +02:00
Patrick von Platen	0f5488f79f	[Wav2Vec2] Fix mask_feature_prob (#13921 ) * up * overwrite hubert	2021-10-07 19:07:32 +03:00
Patrick von Platen	a75db353c4	[Slow tests] Disable Wav2Vec2 pretraining test for now (#13303 ) * fix_torch_device_generate_test * remove @ * wav2vec2 pretraining Co-authored-by: Patrick von Platen <patrick@huggingface.co>	2021-08-30 06:03:02 -04:00
Anton Lozhkov	b6f332ecaf	Add Wav2Vec2 & Hubert ForSequenceClassification (#13153 ) * Add hubert classifier + tests * Add hubert classifier + tests * Dummies for all classification tests * Wav2Vec2 classifier + ER test * Fix hubert integration tests * Add hubert IC * Pass tests for all classification tasks on Hubert * Pass all tests + copies * Move models to the SUPERB org	2021-08-27 20:52:51 +03:00
Patrick von Platen	f6e254474c	[Sequence Feature Extraction] Add truncation (#12804 ) * fix_torch_device_generate_test * remove @ * add truncate * finish * correct test * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * clean tests * correct normalization for truncation * remove casting * up * save intermed * finish * finish * correct Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-07-23 17:53:30 +02:00
Lysandre Debut	c3d9ac7607	Expose get_config() on ModelTesters (#12812 ) * Expose get_config() on ModelTesters * Typo	2021-07-21 04:13:11 -04:00
Patrick von Platen	b4b562d834	[Wav2Vec2] Padded vectors should not allowed to be sampled (#12764 ) * fix_torch_device_generate_test * remove @ * finish * correct script * correct script	2021-07-16 19:07:08 +02:00
Patrick von Platen	2e9fb13fb1	[Wav2Vec2] Correctly pad mask indices for PreTraining (#12748 ) * fix_torch_device_generate_test * remove @ * start adding tests * correct wav2vec2 pretraining * up * up Co-authored-by: Patrick von Platen <patrick@huggingface.co>	2021-07-15 21:40:25 +01:00
Patrick von Platen	27d348f2fe	[Wav2Vec2, Hubert] Fix ctc loss test (#12458 ) * fix_torch_device_generate_test * remove @ * fix test	2021-07-01 08:59:32 -04:00
Will Rice	bc084938f2	Add out of vocabulary error to ASR models (#12288 ) * Add OOV error to ASR models * Feedback changes	2021-06-29 08:57:46 +01:00
Patrick von Platen	bc6f51e539	[Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089 ) * fix_torch_device_generate_test * remove @ * fix tests	2021-06-09 20:41:59 +01:00
Anton Lozhkov	d472bd7b18	Wav2Vec2 Pretraining (#11306 ) * Working quantizer forward * Working quantizer forward * Clean up unused model parts, test reproducibility * Working quantizer forward * Clean up unused model parts, test reproducibility * Remove custom outputs from the shared ones * correct conversion * correct bug * add first pretrain script * save intermediate * static shapes * save intermediate * finish first pretrain script version * more refactor * remove wanddb * refactor more * improve test * correct perplexity compute bug * finish model implementation * add to docs * finish docs * finish pretraining script * finish pretraining script * remove wandb * finish PR for merge * finish config * finish * make deepspeed work * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply suggestions * fix flaky test Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-09 18:40:56 +01:00
Patrick von Platen	7630c11f32	[Wav2Vec2] SpecAugment Fast (#11764 ) * first try * finish	2021-05-25 13:59:52 +01:00
Patrick von Platen	3e3e41ae20	Pytorch - Lazy initialization of models (#11471 ) * lazy_init_weights * remove ipdb * save int * add necessary code * remove unnecessary utils * Update src/transformers/models/t5/modeling_t5.py * clean * add tests * correct * finish tests * finish tests * fix some more tests * fix xlnet & transfo-xl * fix more tests * make sure tests are independent * fix tests more * finist tests * final touches * Update src/transformers/modeling_utils.py * Apply suggestions from code review * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * clean tests * give arg positive name * add more mock weights to xlnet Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-05-05 17:22:20 +02:00
Sylvain Gugger	acc3bd9d2a	Enforce string-formatting with f-strings (#10980 ) * First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-31 10:00:27 -04:00
Patrick von Platen	0486ccdd3d	small improvements (#10773 )	2021-03-17 18:10:17 +03:00
Patrick von Platen	d9e693e1d0	make wav2vec2 test deterministic (#10714 )	2021-03-15 09:50:05 -04:00
Patrick von Platen	0234de8418	Add Fine-Tuning for Wav2Vec2 (#10145 ) * add encode labels function to tokenizer * start adding finetuning * init dropout * upload * correct convert script * apply changes * fix second typo * make first dummy training run * adapt convert script * push confg for comparison * remove conf * finish training * adapt data collator * add research folder * update according to fairseq feedback * some minor corrections * refactor masking indices a bit * some minor changes * clean tokenizer * finish clean-up * remove previous logic * update run script * correct training * finish changes * finish model * correct bug * fix training a bit more * add some tests * finish gradient checkpointing * finish example * correct gradient checkpointing * improve tokenization method * revert changes in tokenizer * revert general change * adapt fine-tuning * update * save intermediate test * Update README.md * finish finetuning * delete conversion script * Update src/transformers/models/wav2vec2/configuration_wav2vec2.py * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * finish wav2vec2 script * finish wav2vec2 fine-tuning * finalize test * correct test * adapt tests * finish * remove test file Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-01 12:13:17 +03:00
Patrick von Platen	cb38ffcc5e	[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324 ) * push to show * small improvement * small improvement * Update src/transformers/feature_extraction_utils.py * Update src/transformers/feature_extraction_utils.py * implement base * add common tests * make all tests pass for wav2vec2 * make padding work & add more tests * finalize feature extractor utils * add call method to feature extraction * finalize feature processor * finish tokenizer * finish general processor design * finish tests * typo * remove bogus file * finish docstring * add docs * finish docs * small fix * correct docs * save intermediate * load changes * apply changes * apply changes to doc * change tests * apply surajs recommend * final changes * Apply suggestions from code review * fix typo * fix import * correct docstring	2021-02-25 17:42:46 +03:00
Patrick von Platen	495c157d6f	[Wav2Vec2] Improve Tokenizer & Model for batched inference (#10117 ) * save intermediate * finish batch the same as fairseq * add normalization * fix batched input * add better comment * Update src/transformers/models/wav2vec2/modeling_wav2vec2.py * add nice docstring * add tokenizer tests * make all slow tests pass * finish PR * correct import	2021-02-11 15:40:54 +03:00
Patrick von Platen	b972125ced	Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC (#10089 ) * add wav2vec2CTC and deprecate for maskedlm * remove from docs	2021-02-09 03:49:02 -05:00
Patrick von Platen	d6217fb30c	Wav2Vec2 (#9659 ) * add raw scaffold * implement feat extract layers * make style * remove + * correctly convert weights * make feat extractor work * make feature extraction proj work * run forward pass * finish forward pass * Succesful decoding example * remove unused files * more changes * add wav2vec tokenizer * add new structure * fix run forward * add other layer norm architecture * finish 2nd structure * add model tests * finish tests for tok and model * clean-up * make style * finish docstring for model and config * make style * correct docstring * correct tests * change checkpoints to fairseq * fix examples * finish wav2vec2 * make style * apply sylvains suggestions * apply lysandres suggestions * change print to log.info * re-add assert statement * add input_values as required input name * finish wav2vec2 tokenizer * Update tests/test_tokenization_wav2vec2.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * apply sylvains suggestions Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-02-02 15:52:10 +03:00

30 Commits