[testing utils] get_auto_remove_tmp_dir more intuitive behavior (#8401)

* [testing utils] get_auto_remove_tmp_dir default change

Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.

99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.

* simplify things

* update docs

* fix doc layout

* style

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* better 3-state doc

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* s/tmp/temporary/ + style

* correct the statement

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Stas Bekman 2020-11-10 08:57:21 -08:00 committed by GitHub
parent e7e1549895
commit e21340da7a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 83 additions and 48 deletions

View File

@ -716,11 +716,11 @@ Temporary files and directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
each other's data. Also we want to get the temp files and directories removed at the end of each test that created
each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
them. Therefore, using packages like ``tempfile``, which address these needs is essential.
However, when debugging tests, you need to be able to see what goes into the temp file or directory and you want to
know it's exact path and not having it randomized on every test re-run.
However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
to know it's exact path and not having it randomized on every test re-run.
A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
@ -736,32 +736,33 @@ Here is an example of its usage:
This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.
In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless
``after=False`` is passed to the helper function.
* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a
specific directory:
* Create a unique temporary dir:
.. code-block:: python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
tmp_dir = self.get_auto_remove_tmp_dir()
* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the
temp results:
``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.
* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.
.. code-block:: python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion
in the previous test run and want to make sure the that temporary directory is empty before the new test is run:
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
leave any data in there.
.. code-block:: python
* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
following behaviors:
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
- ``before=True``: the temporary dir will always be cleared at the beginning of the test.
- ``before=False``: if the temporary dir already existed, any existing files will remain there.
- ``after=True``: the temporary dir will always be deleted at the end of the test.
- ``after=False``: the temporary dir will always be left intact at the end of the test.
.. note::
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
@ -815,7 +816,7 @@ or the ``xfail`` way:
@pytest.mark.xfail
def test_feature_x():
Here is how to skip a test based on some internal check inside the test:
- Here is how to skip a test based on some internal check inside the test:
.. code-block:: python
@ -838,7 +839,7 @@ or the ``xfail`` way:
def test_feature_x():
pytest.xfail("expected to fail until bug XYZ is fixed")
Here is how to skip all tests in a module if some import is missing:
- Here is how to skip all tests in a module if some import is missing:
.. code-block:: python

View File

@ -522,45 +522,47 @@ class TestCasePlus(unittest.TestCase):
- ``repo_root_dir_str``
- ``src_dir_str``
Feature 2: Flexible auto-removable temp dirs which are guaranteed to get removed at the end of test.
Feature 2: Flexible auto-removable temporary dirs which are guaranteed to get removed at the end of test.
In all the following scenarios the temp dir will be auto-removed at the end of test, unless `after=False`.
# 1. create a unique temp dir, `tmp_dir` will contain the path to the created temp dir
1. Create a unique temporary dir:
::
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir()
# 2. create a temp dir of my choice and delete it at the end - useful for debug when you want to # monitor a
specific directory
``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.
2. Create a temporary dir of my choice, ensure it's empty before the test starts and don't
empty it after the test.
::
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
# 3. create a temp dir of my choice and do not delete it at the end - useful for when you want # to look at the
temp results
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests
didn't leave any data in there.
::
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
3. You can override the first two options by directly overriding the ``before`` and ``after`` args, leading to the
following behavior:
# 4. create a temp dir of my choice and ensure to delete it right away - useful for when you # disabled deletion in
the previous test run and want to make sure the that tmp dir is empty # before the new test is run
``before=True``: the temporary dir will always be cleared at the beginning of the test.
::
``before=False``: if the temporary dir already existed, any existing files will remain there.
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
``after=True``: the temporary dir will always be deleted at the end of the test.
Note 1: In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are
allowed if an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem
will get nuked. i.e. please always pass paths that start with `./`
``after=False``: the temporary dir will always be left intact at the end of the test.
Note 2: Each test can register multiple temp dirs and they all will get auto-removed, unless requested otherwise.
Note 1: In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are
allowed if an explicit ``tmp_dir`` is used, so that by mistake no ``/tmp`` or similar important part of the
filesystem will get nuked. i.e. please always pass paths that start with ``./``
Note 2: Each test can register multiple temporary dirs and they all will get auto-removed, unless requested
otherwise.
Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite.
This is useful for invoking external programs from the test suite - e.g. distributed training.
@ -573,6 +575,7 @@ class TestCasePlus(unittest.TestCase):
"""
def setUp(self):
# get_auto_remove_tmp_dir feature:
self.teardown_tmp_dirs = []
# figure out the resolved paths for repo_root, tests, examples, etc.
@ -660,21 +663,42 @@ class TestCasePlus(unittest.TestCase):
env["PYTHONPATH"] = ":".join(paths)
return env
def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
def get_auto_remove_tmp_dir(self, tmp_dir=None, before=None, after=None):
"""
Args:
tmp_dir (:obj:`string`, `optional`):
use this path, if None a unique path will be assigned
before (:obj:`bool`, `optional`, defaults to :obj:`False`):
if `True` and tmp dir already exists make sure to empty it right away
after (:obj:`bool`, `optional`, defaults to :obj:`True`):
delete the tmp dir at the end of the test
if :obj:`None`:
- a unique temporary path will be created
- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=True`` if ``after`` is :obj:`None`
else:
- :obj:`tmp_dir` will be created
- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=False`` if ``after`` is :obj:`None`
before (:obj:`bool`, `optional`):
If :obj:`True` and the :obj:`tmp_dir` already exists, make sure to empty it right away if :obj:`False`
and the :obj:`tmp_dir` already exists, any existing files will remain there.
after (:obj:`bool`, `optional`):
If :obj:`True`, delete the :obj:`tmp_dir` at the end of the test if :obj:`False`, leave the
:obj:`tmp_dir` and its contents intact at the end of the test.
Returns:
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-created tmp
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-selected tmp
dir
"""
if tmp_dir is not None:
# defining the most likely desired behavior for when a custom path is provided.
# this most likely indicates the debug mode where we want an easily locatable dir that:
# 1. gets cleared out before the test (if it already exists)
# 2. is left intact after the test
if before is None:
before = True
if after is None:
after = False
# using provided path
path = Path(tmp_dir).resolve()
@ -691,6 +715,15 @@ class TestCasePlus(unittest.TestCase):
path.mkdir(parents=True, exist_ok=True)
else:
# defining the most likely desired behavior for when a unique tmp path is auto generated
# (not a debug mode), here we require a unique tmp dir that:
# 1. is empty before the test (it will be empty in this situation anyway)
# 2. gets fully removed after the test
if before is None:
before = True
if after is None:
after = True
# using unique tmp dir (always empty, regardless of `before`)
tmp_dir = tempfile.mkdtemp()
@ -701,7 +734,8 @@ class TestCasePlus(unittest.TestCase):
return tmp_dir
def tearDown(self):
# remove registered temp dirs
# get_auto_remove_tmp_dir feature: remove registered temp dirs
for path in self.teardown_tmp_dirs:
shutil.rmtree(path, ignore_errors=True)
self.teardown_tmp_dirs = []