Suraj Patil
cfd2eaa8cf
[GPTNeo] create local attention mask ones ( #11335 )
...
* create local attention mask ones
* remove old method, address patricks comment
2021-04-20 18:37:44 +05:30
Suraj Patil
2a8115f083
[WIP] GPT Neo cleanup ( #10985 )
...
* better names
* add attention mixin
* all slow tests in one class
* make helper methods static so we can test
* add local attention tests
* better names
* doc
* apply review suggestions
2021-04-06 12:24:15 -04:00
Suraj Patil
83d38c9ff3
GPT Neo few fixes ( #10968 )
...
* fix checkpoint names
* auto model
* fix doc
2021-03-30 11:15:55 -04:00
Suraj Patil
860264379f
GPT Neo ( #10848 )
...
* lets begin
* boom boom
* fix out proj in attn
* fix attention
* fix local attention
* add tokenizer
* fix imports
* autotokenizer
* fix checkpoint name
* cleanup
* more clean-up
* more cleanup
* output attentions
* fix attn mask creation
* fix imports
* config doc
* add tests
* add slow tests
* quality
* add conversion script
* copyright
* typo
* another bites the dust
* fix attention tests
* doc
* add embed init in convert function
* fix copies
* remove tokenizer
* enable caching
* address review comments
* improve config and create attn layer list internally
* more consistent naming
* init hf config from mesh-tf config json file
* remove neo tokenizer from doc
* handle attention_mask in local attn layer
* attn_layers => attention_layers
* add tokenizer_class in config
* fix docstring
* raise if len of attention_layers is not same as num_layers
* remove tokenizer_class from config
* more consistent naming
* fix doc
* fix checkpoint names
* fp16 compat
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 09:42:30 -04:00