Suraj Patil
|
7a259c190c
|
FlaxGPTNeo (#12493)
* flax gpt neo
* fix query scaling
* update generation test
* use flax model for test
|
2021-07-06 18:55:18 +05:30 |
|
Bhadresh Savani
|
e1205e478a
|
Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes
* fix tests
|
2021-05-28 06:27:02 -04:00 |
|
Sylvain Gugger
|
74712e22f3
|
Honor contributors to models (#11329)
* Honor contributors to models
* Fix typo
* Address review comments
* Add more authors
|
2021-04-21 09:47:27 -04:00 |
|
Stas Bekman
|
520198f56f
|
[doc] gpt-neo (#11098)
make the example work
|
2021-04-06 16:42:06 -04:00 |
|
Suraj Patil
|
83d38c9ff3
|
GPT Neo few fixes (#10968)
* fix checkpoint names
* auto model
* fix doc
|
2021-03-30 11:15:55 -04:00 |
|
Suraj Patil
|
860264379f
|
GPT Neo (#10848)
* lets begin
* boom boom
* fix out proj in attn
* fix attention
* fix local attention
* add tokenizer
* fix imports
* autotokenizer
* fix checkpoint name
* cleanup
* more clean-up
* more cleanup
* output attentions
* fix attn mask creation
* fix imports
* config doc
* add tests
* add slow tests
* quality
* add conversion script
* copyright
* typo
* another bites the dust
* fix attention tests
* doc
* add embed init in convert function
* fix copies
* remove tokenizer
* enable caching
* address review comments
* improve config and create attn layer list internally
* more consistent naming
* init hf config from mesh-tf config json file
* remove neo tokenizer from doc
* handle attention_mask in local attn layer
* attn_layers => attention_layers
* add tokenizer_class in config
* fix docstring
* raise if len of attention_layers is not same as num_layers
* remove tokenizer_class from config
* more consistent naming
* fix doc
* fix checkpoint names
* fp16 compat
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-03-30 09:42:30 -04:00 |
|