Quality explain (#14264)

* Start PR doc * Cleanup the quality checks and document them * Add reference in the contributing guide * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Rename file as per review suggestion Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2025-07-03 04:40:06 +06:00 · 2021-11-03 17:43:19 -04:00 · 2021-11-03 17:43:19 -04:00 · f0d6e952c0
commit f0d6e952c0
parent a1c15ea855
6 changed files with 163 additions and 112 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -811,6 +811,28 @@ jobs:
            - run: python utils/custom_init_isort.py --check_only
            - run: flake8 examples tests src utils
            - run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
+
+    check_repository_consistency:
+        working_directory: ~/transformers
+        docker:
+            - image: circleci/python:3.6
+        resource_class: large
+        environment:
+            TRANSFORMERS_IS_CI: yes
+        parallelism: 1
+        steps:
+            - checkout
+            - restore_cache:
+                  keys:
+                      - v0.4-repository_consistency-{{ checksum "setup.py" }}
+                      - v0.4-{{ checksum "setup.py" }}
+            - run: pip install --upgrade pip
+            - run: pip install isort GitPython
+            - run: pip install .[all,quality]
+            - save_cache:
+                  key: v0.4-repository_consistency-{{ checksum "setup.py" }}
+                  paths:
+                      - '~/.cache/pip'
            - run: python utils/check_copies.py
            - run: python utils/check_table.py
            - run: python utils/check_dummies.py
@ -819,17 +841,6 @@ jobs:
            - run: make deps_table_check_updated
            - run: python utils/tests_fetcher.py --sanity_check

-    check_repository_consistency:
-        working_directory: ~/transformers
-        docker:
-            - image: circleci/python:3.6
-        resource_class: small
-        parallelism: 1
-        steps:
-            - checkout
-            - run: pip install requests
-            - run: python ./utils/link_tester.py
-
    run_tests_layoutlmv2:
        working_directory: ~/transformers
        docker:
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -273,9 +273,11 @@ Follow these steps to start contributing:
   - If you are adding a new tokenizer, write tests, and make sure
     `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
   CircleCI does not run the slow tests, but github actions does every night!
-6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
+6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_bert.py` for an
   example.

+See more about the checks run on a pull request in our [PR guide](https://huggingface.co/transformers/master/pr_tests.html)
+
 ### Tests

 An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
--- a/10
+++ b/10
@ -31,9 +31,9 @@ deps_table_check_updated:

 autogenerate_code: deps_table_update

-# Check that source code meets quality standards
+# Check that the repo is in a good state

-extra_quality_checks:
+repo-consistency:
 	python utils/check_copies.py
 	python utils/check_table.py
 	python utils/check_dummies.py
@ -42,12 +42,13 @@ extra_quality_checks:
 	python utils/tests_fetcher.py --sanity_check

 # this target runs checks on all files
+
 quality:
 	black --check $(check_dirs)
 	isort --check-only $(check_dirs)
 	python utils/custom_init_isort.py --check_only
 	flake8 $(check_dirs)
-	${MAKE} extra_quality_checks
+	python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only

 # Format source code automatically and check is there are any problems left that need manual fixing

@ -56,6 +57,7 @@ extra_style_checks:
 	python utils/style_doc.py src/transformers docs/source --max_len 119

 # this target runs checks on all files and potentially modifies some of them
+
 style:
 	black $(check_dirs)
 	isort $(check_dirs)
@ -64,7 +66,7 @@ style:

 # Super fast fix and check target that only works on relevant modified files since the branch was made

-fixup: modified_only_fixup extra_style_checks autogenerate_code extra_quality_checks
+fixup: modified_only_fixup extra_style_checks autogenerate_code repo-consistency

 # Make marked copies of snippets of codes conform to the original

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -559,6 +559,7 @@ Flax), PyTorch, and/or TensorFlow.
    testing
    debugging
    serialization
+    pr_checks

 .. toctree::
    :maxdepth: 2
--- a/docs/source/pr_checks.md
+++ b/docs/source/pr_checks.md
@ -0,0 +1,131 @@
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Checks on a Pull Request
+
+When you open a pull request on 🤗 Transformers, a fair number of checks will be run to make sure the patch you are adding is not breaking anything existing. Those checks are of four types:
+- regular tests
+- documentation build
+- code and documentation style
+- general repository consistency
+
+In this document, we will take a stab at explaining what those various checks are and the reason behind them, as well as how to debug them locally if one of them fails on your PR.
+
+Note that they all require you to have a dev install:
+
+```bash
+pip install transformers[dev]
+```
+
+or for an editable install:
+
+```bash
+pip install -e .[dev]
+```
+
+inside the Transformers repo.
+
+## Tests
+
+All the jobs that begin with `ci/circleci: run_tests_` run parts of the Transformers testing suite. Each of those jobs focuses on a part of the library in a certain environment: for instance `ci/circleci: run_tests_pipelines_tf` runs the pipelines test in an environment where TensorFlow only is installed.
+
+Note that to avoid running tests when there is no real change in the modules they are testing, only part of the test suite is run each time: a utility is run to determine the differences in the library between before and after the PR (what GitHub shows you in the "Files changes" tab) and picks the tests impacted by that diff. That utility can be run locally with:
+
+```bash
+python utils/test_fetcher.py
+```
+
+from the root of the Transformers repo. It will:
+
+1. Check for each file in the diff if the changes are in the code or only in comments or docstrings. Only the files with real code changes are kept.
+2. Build an internal map that gives for each file of the source code of the library all the files it recursively impacts. Module A is said to impact module B if module B imports module A. For the recursive impact, we need a chain of modules going from module A to module B in which each module imports the previous one.
+3. Apply this map on the files gathered in step 1, which  gives us the list of model files impacted by the PR.
+4. Map each of those files to their corresponding test file(s) and get the list of tests to run.
+
+When executing the script locally, you should get the results of step 1, 3 and 4 printed and thus know which tests are run. The script will also create a file named `test_list.txt` which contains the list of tests to run, and you can run them locally with the following command:
+
+```bash
+python -m pytest -n 8 --dist=loadfile -rA -s $(cat test_list.txt)
+```
+
+Just in case anything slipped through the cracks, the full test suite is also run daily.
+
+## Documentation build
+
+The job `ci/circleci: build_doc` runs a build of the documentation just to make sure everything will be okay once your PR is merged. If that steps fails, you can inspect it locally by going into the `docs` folder of the Transformers repo and then typing
+
+```bash
+make html
+```
+
+Sphinx is not known for its helpful error messages, so you might have to try a few things to really find the source of the error.
+
+## Code and documentation style
+
+Code formatting is applied to all the source files, the examples and the tests using `black` and `isort`. We also have a custom tool taking care of the formatting of docstrings and `rst` files (`utils/style_doc.py`), as well as the order of the lazy imports performed in the Transformers `__init__.py` files (`utils/custom_init_isort.py`). All of this can be launched by executing
+
+```bash
+make style
+```
+
+The CI checks those have been applied inside the `ci/circleci: check_code_quality` check. It also runs `flake8`, that will have a basic look at your code and will complain if it finds an undefined variable, or one that is not used. To run that check locally, use
+
+```bash
+make quality
+```
+
+This can take a lot of time, so to run the same thing on only the files you modified in the current branch, run
+
+```bash
+make fixup
+```
+
+This last command will also run all the additional checks for the repository consistency. Let's have a look at them.
+
+## Repository consistency
+
+This regroups all the tests to make sure your PR leaves the repository in a good state, and is performed by the `ci/circleci: check_repository_consistency` check. You can locally run that check by executing the following:
+
+```bash
+make repo-consistency
+```
+
+This checks that:
+
+- All objects added to the init are documented (performed by `utils/check_repo.py`)
+- All `__init__.py` files have the same content in their two sections (performed by `utils/check_inits.py`)
+- All code identified as a copy from another module is consistent with the original (performed by `utils/check_copies.py`)
+- The translations of the READMEs and the index of the doc have the same model list as the main README (performed by `utils/check_copies.py`)
+- The auto-generated tables in the documentation are up to date (performed by `utils/check_table.py`)
+- The library has all objects available even if not all optional dependencies are installed (performed by `utils/check_dummies.py`)
+
+Should this check fail, the first two items require manual fixing, the last four can be fixed automatically for you by running the command
+
+```bash
+make fix-copies
+```
+
+Additional checks concern PRs that add new models, mainly that:
+
+- All models added are in an Auto-mapping (performed by `utils/check_repo.py`)
+<!-- TODO Sylvain, add a check that makes sure the common tests are implemented.-->
+- All models are properly tested (performed by `utils/check_repo.py`)
+
+<!-- TODO Sylvain, add the following
+- All models are added to the main README, inside the master doc
+- All checkpoints used actually exist on the Hub
+
+-->
--- a/utils/link_tester.py
+++ b/utils/link_tester.py
@ -1,96 +0,0 @@
-# Copyright 2020 The HuggingFace Team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""
-Link tester.
-
-This little utility reads all the python files in the repository,
-scans for links pointing to S3 and tests the links one by one. Raises an error
-at the end of the scan if at least one link was reported broken.
-"""
-import os
-import re
-import sys
-
-import requests
-
-
-REGEXP_FIND_S3_LINKS = r"""([\"'])(https:\/\/s3)(.*)?\1"""
-
-
-S3_BUCKET_PREFIX = "https://s3.amazonaws.com/models.huggingface.co/bert"
-
-
-def list_python_files_in_repository():
-    """List all python files in the repository.
-
-    This function assumes that the script is executed in the root folder.
-    """
-    source_code_files = []
-    for path, subdirs, files in os.walk("."):
-        if "templates" in path:
-            continue
-        for name in files:
-            if ".py" in name and ".pyc" not in name:
-                path_to_files = os.path.join(path, name)
-                source_code_files.append(path_to_files)
-
-    return source_code_files
-
-
-def find_all_links(file_paths):
-    links = []
-    for path in file_paths:
-        links += scan_code_for_links(path)
-
-    return [link for link in links if link != S3_BUCKET_PREFIX]
-
-
-def scan_code_for_links(source):
-    """Scans the file to find links using a regular expression.
-    Returns a list of links.
-    """
-    with open(source, "r") as content:
-        content = content.read()
-        raw_links = re.findall(REGEXP_FIND_S3_LINKS, content)
-        links = [prefix + suffix for _, prefix, suffix in raw_links]
-
-    return links
-
-
-def check_all_links(links):
-    """Check that the provided links are valid.
-
-    Links are considered valid if a HEAD request to the server
-    returns a 200 status code.
-    """
-    broken_links = []
-    for link in links:
-        head = requests.head(link)
-        if head.status_code != 200:
-            broken_links.append(link)
-
-    return broken_links
-
-
-if __name__ == "__main__":
-    file_paths = list_python_files_in_repository()
-    links = find_all_links(file_paths)
-    broken_links = check_all_links(links)
-    print("Looking for broken links to pre-trained models/configs/tokenizers...")
-    if broken_links:
-        print("The following links did not respond:")
-        for link in broken_links:
-            print(f"- {link}")
-        sys.exit(1)
-    print("All links are ok.")