* Fix multi gpu loss sync condition, add doc and test * rename function and class * loss should not scale during inference * fix typo