Suraj Patil
|
860264379f
|
GPT Neo (#10848)
* lets begin
* boom boom
* fix out proj in attn
* fix attention
* fix local attention
* add tokenizer
* fix imports
* autotokenizer
* fix checkpoint name
* cleanup
* more clean-up
* more cleanup
* output attentions
* fix attn mask creation
* fix imports
* config doc
* add tests
* add slow tests
* quality
* add conversion script
* copyright
* typo
* another bites the dust
* fix attention tests
* doc
* add embed init in convert function
* fix copies
* remove tokenizer
* enable caching
* address review comments
* improve config and create attn layer list internally
* more consistent naming
* init hf config from mesh-tf config json file
* remove neo tokenizer from doc
* handle attention_mask in local attn layer
* attn_layers => attention_layers
* add tokenizer_class in config
* fix docstring
* raise if len of attention_layers is not same as num_layers
* remove tokenizer_class from config
* more consistent naming
* fix doc
* fix checkpoint names
* fp16 compat
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-03-30 09:42:30 -04:00 |
|