* test
* fix
* fix
* skip some and run some first
* test fsdp
* fix
* patches for generate
* test distributed
* copy
* don't test distributed loss for hpu
* require fp16 and run first
* changes from marc's PR fixing zero3
* better alternative
* return True when fp16 support on gaudi without creating bridge
* fix
* fix tested dtype in deepspeed inference test
* test
* fix
* test
* fix
* skip
* require fp16
* run first fsdp
* Apply suggestions from code review
* address comments
* address comments and refactor test
* reduce precison
* avoid doing gaudi1 specific stuff in the genreation loop
* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
* Make training args fully immutable
* Working tests, PyTorch
* In test_trainer
* during testing
* Use proper dataclass way
* Fix test
* Another one
* Fix tf
* Lingering slow
* Exception
* Clean
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* Add XLA torchrun support
* Clarify that currently DDP doesn't work with torch.distributed XLA backend yet
* Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)
* Add check for AWS Neuron availability and AWS Neuron specific compiler flag
* Change the new test's name to TestTrainerDistributedNeuronCore
* Remove "assert" and replace raised exception
* Remove compiler flag as it is optional. If needed, will be another PR.
* Use TORCHELASTIC_RUN_ID to determine whether torchrun is used