* DataParallel fixes:
1. switched to a more precise check
- if self.args.n_gpu > 1:
+ if isinstance(model, nn.DataParallel):
2. fix tests - require the same fixup under DataParallel as the training module
* another fix
* Don't pass sampler for iterable dataset
* Added check for test and eval dataloaders.
* Formatting
* Don't pass sampler for iterable dataset
* Added check for test and eval dataloaders.
* Formatting
* Cleaner if nesting.
* Added test for trainer and iterable dataset
* Formatting for test
* Fixed import when torch is available only.
* Added require torch decorator to helper class
* Moved dataset class inside unittest
* Removed nested if and changed model in test
* Checking torch availability for IterableDataset
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
* fix merge rebase
* add intermediate reformer code
* save intermediate caching results
* save intermediate
* save intermediate results
* save intermediate
* upload next step
* fix generate tests
* make tests work
* add named tuple output
* Apply suggestions from code review
* fix use_cache for False case
* fix tensor to gpu
* fix tensor to gpu
* refactor
* refactor and make style
* language tag addition on albert-mongolian
* Update model_cards/bayartsogt/albert-mongolian/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>