Commit Graph

9 Commits

Author SHA1 Message Date
Sylvain Gugger
27d4639779
Make gradient_checkpointing a training argument (#13657)
* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2021-09-22 07:51:38 -04:00
Suraj Patil
010965dcde
[GPT-Neo] Simplify local attention (#13491)
* simplify local attention

* update tests

* add a comment and use torch.bitwise_xor
2021-09-10 22:52:20 +05:30
Lysandre Debut
c3d9ac7607
Expose get_config() on ModelTesters (#12812)
* Expose get_config() on ModelTesters

* Typo
2021-07-21 04:13:11 -04:00
Bhadresh Savani
e1205e478a
Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes

* fix tests
2021-05-28 06:27:02 -04:00
Michael Benayoun
f4a0d6ff86
A cleaner and more scalable implementation of symbolic tracing (#11763)
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo

Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-05-20 18:02:29 +02:00
Suraj Patil
cfd2eaa8cf
[GPTNeo] create local attention mask ones (#11335)
* create local attention mask ones

* remove old method, address patricks comment
2021-04-20 18:37:44 +05:30
Suraj Patil
2a8115f083
[WIP] GPT Neo cleanup (#10985)
* better names

* add attention mixin

* all slow tests in one class

* make helper methods static so we can test

* add local attention tests

* better names

* doc

* apply review suggestions
2021-04-06 12:24:15 -04:00
Suraj Patil
83d38c9ff3
GPT Neo few fixes (#10968)
* fix checkpoint names

* auto model

* fix doc
2021-03-30 11:15:55 -04:00
Suraj Patil
860264379f
GPT Neo (#10848)
* lets begin

* boom boom

* fix out proj in attn

* fix attention

* fix local attention

* add tokenizer

* fix imports

* autotokenizer

* fix checkpoint name

* cleanup

* more clean-up

* more cleanup

* output attentions

* fix attn mask creation

* fix imports

* config doc

* add tests

* add slow tests

* quality

* add conversion script

* copyright

* typo

* another bites the dust

* fix attention tests

* doc

* add embed init in convert function

* fix copies

* remove tokenizer

* enable caching

* address review comments

* improve config and create attn layer list internally

* more consistent naming

* init hf config from mesh-tf config json file

* remove neo tokenizer from doc

* handle attention_mask in local attn layer

* attn_layers => attention_layers

* add tokenizer_class in config

* fix docstring

* raise if len of attention_layers is not same as num_layers

* remove tokenizer_class from config

* more consistent naming

* fix doc

* fix checkpoint names

* fp16 compat

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 09:42:30 -04:00