[testing utils] get_auto_remove_tmp_dir more intuitive behavior (#8401)

* [testing utils] get_auto_remove_tmp_dir default change Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal. 99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug. * simplify things * update docs * fix doc layout * style * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * better 3-state doc * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * s/tmp/temporary/ + style * correct the statement Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2025-07-31 02:02:21 +06:00 · 2020-11-10 08:57:21 -08:00 · 2020-11-10 08:57:21 -08:00 · e21340da7a
commit e21340da7a
parent e7e1549895
2 changed files with 83 additions and 48 deletions
--- a/docs/source/testing.rst
+++ b/docs/source/testing.rst
@ -716,11 +716,11 @@ Temporary files and directories
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
-each other's data. Also we want to get the temp files and directories removed at the end of each test that created
+each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
 them. Therefore, using packages like ``tempfile``, which address these needs is essential.

-However, when debugging tests, you need to be able to see what goes into the temp file or directory and you want to
-know it's exact path and not having it randomized on every test re-run.
+However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
+to know it's exact path and not having it randomized on every test re-run.

 A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
 :obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
@ -736,32 +736,33 @@ Here is an example of its usage:

 This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.

-In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless
-``after=False`` is passed to the helper function.
-
-* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a
-  specific directory:
+* Create a unique temporary dir:

 .. code-block:: python

    def test_whatever(self):
-        tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
+        tmp_dir = self.get_auto_remove_tmp_dir()

-* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the
-  temp results:
+``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
+test.
+
+* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.

 .. code-block:: python

    def test_whatever(self):
-        tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
+        tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

-* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion
-  in the previous test run and want to make sure the that temporary directory is empty before the new test is run:
+This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
+leave any data in there.

-.. code-block:: python
+* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
+  following behaviors:

-   def test_whatever(self):
-        tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
+    - ``before=True``: the temporary dir will always be cleared at the beginning of the test.
+    - ``before=False``: if the temporary dir already existed, any existing files will remain there.
+    - ``after=True``: the temporary dir will always be deleted at the end of the test.
+    - ``after=False``: the temporary dir will always be left intact at the end of the test.

 .. note::
   In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
@ -815,7 +816,7 @@ or the ``xfail`` way:
    @pytest.mark.xfail
    def test_feature_x():

-Here is how to skip a test based on some internal check inside the test:
+- Here is how to skip a test based on some internal check inside the test:

 .. code-block:: python

@ -838,7 +839,7 @@ or the ``xfail`` way:
    def test_feature_x():
        pytest.xfail("expected to fail until bug XYZ is fixed")

-Here is how to skip all tests in a module if some import is missing:
+- Here is how to skip all tests in a module if some import is missing:

 .. code-block:: python

--- a/src/transformers/testing_utils.py
+++ b/src/transformers/testing_utils.py
@ -522,45 +522,47 @@ class TestCasePlus(unittest.TestCase):
       - ``repo_root_dir_str``
       - ``src_dir_str``

-    Feature 2: Flexible auto-removable temp dirs which are guaranteed to get removed at the end of test.
+    Feature 2: Flexible auto-removable temporary dirs which are guaranteed to get removed at the end of test.

-    In all the following scenarios the temp dir will be auto-removed at the end of test, unless `after=False`.
-
-    # 1. create a unique temp dir, `tmp_dir` will contain the path to the created temp dir
+    1. Create a unique temporary dir:

    ::

        def test_whatever(self):
            tmp_dir = self.get_auto_remove_tmp_dir()

-    # 2. create a temp dir of my choice and delete it at the end - useful for debug when you want to # monitor a
-    specific directory
+    ``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
+    test.
+
+
+    2. Create a temporary dir of my choice, ensure it's empty before the test starts and don't
+    empty it after the test.

    ::

        def test_whatever(self):
-            tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
+            tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

-    # 3. create a temp dir of my choice and do not delete it at the end - useful for when you want # to look at the
-    temp results
+    This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests
+    didn't leave any data in there.

-    ::
-        def test_whatever(self):
-            tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
+    3. You can override the first two options by directly overriding the ``before`` and ``after`` args, leading to the
+       following behavior:

-    # 4. create a temp dir of my choice and ensure to delete it right away - useful for when you # disabled deletion in
-    the previous test run and want to make sure the that tmp dir is empty # before the new test is run
+    ``before=True``: the temporary dir will always be cleared at the beginning of the test.

-    ::
+    ``before=False``: if the temporary dir already existed, any existing files will remain there.

-        def test_whatever(self):
-            tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
+    ``after=True``: the temporary dir will always be deleted at the end of the test.

-    Note 1: In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are
-    allowed if an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem
-    will get nuked. i.e. please always pass paths that start with `./`
+    ``after=False``: the temporary dir will always be left intact at the end of the test.

-    Note 2: Each test can register multiple temp dirs and they all will get auto-removed, unless requested otherwise.
+    Note 1: In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are
+    allowed if an explicit ``tmp_dir`` is used, so that by mistake no ``/tmp`` or similar important part of the
+    filesystem will get nuked. i.e. please always pass paths that start with ``./``
+
+    Note 2: Each test can register multiple temporary dirs and they all will get auto-removed, unless requested
+    otherwise.

    Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite.
    This is useful for invoking external programs from the test suite - e.g. distributed training.
@ -573,6 +575,7 @@ class TestCasePlus(unittest.TestCase):
    """

    def setUp(self):
+        # get_auto_remove_tmp_dir feature:
        self.teardown_tmp_dirs = []

        # figure out the resolved paths for repo_root, tests, examples, etc.
@ -660,21 +663,42 @@ class TestCasePlus(unittest.TestCase):
        env["PYTHONPATH"] = ":".join(paths)
        return env

-    def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
+    def get_auto_remove_tmp_dir(self, tmp_dir=None, before=None, after=None):
        """
        Args:
            tmp_dir (:obj:`string`, `optional`):
-                use this path, if None a unique path will be assigned
-            before (:obj:`bool`, `optional`, defaults to :obj:`False`):
-                if `True` and tmp dir already exists make sure to empty it right away
-            after (:obj:`bool`, `optional`, defaults to :obj:`True`):
-                delete the tmp dir at the end of the test
+                if :obj:`None`:
+
+                   - a unique temporary path will be created
+                   - sets ``before=True`` if ``before`` is :obj:`None`
+                   - sets ``after=True`` if ``after`` is :obj:`None`
+                else:
+
+                   - :obj:`tmp_dir` will be created
+                   - sets ``before=True`` if ``before`` is :obj:`None`
+                   - sets ``after=False`` if ``after`` is :obj:`None`
+            before (:obj:`bool`, `optional`):
+                If :obj:`True` and the :obj:`tmp_dir` already exists, make sure to empty it right away if :obj:`False`
+                and the :obj:`tmp_dir` already exists, any existing files will remain there.
+            after (:obj:`bool`, `optional`):
+                If :obj:`True`, delete the :obj:`tmp_dir` at the end of the test if :obj:`False`, leave the
+                :obj:`tmp_dir` and its contents intact at the end of the test.

        Returns:
-            tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-created tmp
+            tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-selected tmp
            dir
        """
        if tmp_dir is not None:
+
+            # defining the most likely desired behavior for when a custom path is provided.
+            # this most likely indicates the debug mode where we want an easily locatable dir that:
+            # 1. gets cleared out before the test (if it already exists)
+            # 2. is left intact after the test
+            if before is None:
+                before = True
+            if after is None:
+                after = False
+
            # using provided path
            path = Path(tmp_dir).resolve()

@ -691,6 +715,15 @@ class TestCasePlus(unittest.TestCase):
            path.mkdir(parents=True, exist_ok=True)

        else:
+            # defining the most likely desired behavior for when a unique tmp path is auto generated
+            # (not a debug mode), here we require a unique tmp dir that:
+            # 1. is empty before the test (it will be empty in this situation anyway)
+            # 2. gets fully removed after the test
+            if before is None:
+                before = True
+            if after is None:
+                after = True
+
            # using unique tmp dir (always empty, regardless of `before`)
            tmp_dir = tempfile.mkdtemp()

@ -701,7 +734,8 @@ class TestCasePlus(unittest.TestCase):
        return tmp_dir

    def tearDown(self):
-        # remove registered temp dirs
+
+        # get_auto_remove_tmp_dir feature: remove registered temp dirs
        for path in self.teardown_tmp_dirs:
            shutil.rmtree(path, ignore_errors=True)
        self.teardown_tmp_dirs = []