mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-30 17:52:35 +06:00
translation main-class files to chinese (#27588)
* translate work * update * update * update [[autodoc]] * Update callback.md --------- Co-authored-by: jiaqiw <wangjiaqi50@huawei.com>
This commit is contained in:
parent
74a3cebfa5
commit
cad1b1192b
@ -68,11 +68,46 @@
|
||||
title: 概念指南
|
||||
- sections:
|
||||
- sections:
|
||||
- local: main_classes/agent
|
||||
title: Agents和工具
|
||||
- local: main_classes/callback
|
||||
title: Callbacks
|
||||
- local: main_classes/configuration
|
||||
title: Configuration
|
||||
- local: main_classes/data_collator
|
||||
title: Data Collator
|
||||
- local: main_classes/keras_callbacks
|
||||
title: Keras callbacks
|
||||
- local: main_classes/logging
|
||||
title: Logging
|
||||
- local: main_classes/model
|
||||
title: 模型
|
||||
- local: main_classes/text_generation
|
||||
title: 文本生成
|
||||
- local: main_classes/onnx
|
||||
title: ONNX
|
||||
- local: main_classes/optimizer_schedules
|
||||
title: Optimization
|
||||
- local: main_classes/output
|
||||
title: 模型输出
|
||||
- local: main_classes/pipelines
|
||||
title: Pipelines
|
||||
- local: main_classes/processors
|
||||
title: Processors
|
||||
- local: main_classes/quantization
|
||||
title: Quantization
|
||||
- local: main_classes/tokenizer
|
||||
title: Tokenizer
|
||||
- local: main_classes/trainer
|
||||
title: Trainer
|
||||
- local: main_classes/deepspeed
|
||||
title: DeepSpeed集成
|
||||
- local: main_classes/feature_extractor
|
||||
title: Feature Extractor
|
||||
- local: main_classes/image_processor
|
||||
title: Image Processor
|
||||
title: 主要类
|
||||
title: 应用程序接口 (API)
|
||||
|
||||
|
||||
|
||||
|
101
docs/source/zh/main_classes/agent.md
Normal file
101
docs/source/zh/main_classes/agent.md
Normal file
@ -0,0 +1,101 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Agents和工具
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
Transformers Agents是一个实验性的API,它随时可能发生变化。由于API或底层模型容易发生变化,因此由agents返回的结果可能会有所不同。
|
||||
|
||||
|
||||
</Tip>
|
||||
|
||||
要了解更多关于agents和工具的信息,请确保阅读[介绍指南](../transformers_agents)。此页面包含底层类的API文档。
|
||||
|
||||
|
||||
## Agents
|
||||
|
||||
我们提供三种类型的agents:[`HfAgent`]使用开源模型的推理端点,[`LocalAgent`]使用您在本地选择的模型,[`OpenAiAgent`]使用OpenAI封闭模型。
|
||||
|
||||
|
||||
### HfAgent
|
||||
|
||||
[[autodoc]] HfAgent
|
||||
|
||||
### LocalAgent
|
||||
|
||||
[[autodoc]] LocalAgent
|
||||
|
||||
### OpenAiAgent
|
||||
|
||||
[[autodoc]] OpenAiAgent
|
||||
|
||||
### AzureOpenAiAgent
|
||||
|
||||
[[autodoc]] AzureOpenAiAgent
|
||||
|
||||
### Agent
|
||||
|
||||
[[autodoc]] Agent
|
||||
- chat
|
||||
- run
|
||||
- prepare_for_new_chat
|
||||
|
||||
## 工具
|
||||
|
||||
### load_tool
|
||||
|
||||
[[autodoc]] load_tool
|
||||
|
||||
### Tool
|
||||
|
||||
[[autodoc]] Tool
|
||||
|
||||
### PipelineTool
|
||||
|
||||
[[autodoc]] PipelineTool
|
||||
|
||||
### RemoteTool
|
||||
|
||||
[[autodoc]] RemoteTool
|
||||
|
||||
### launch_gradio_demo
|
||||
|
||||
[[autodoc]] launch_gradio_demo
|
||||
|
||||
## Agent类型
|
||||
|
||||
Agents可以处理工具之间任何类型的对象;工具是多模态的,可以接受和返回文本、图像、音频、视频等类型。为了增加工具之间的兼容性,以及正确地在ipython(jupyter、colab、ipython notebooks等)中呈现这些返回值,我们实现了这些类型的包装类。
|
||||
|
||||
被包装的对象应该继续按照最初的行为方式运作;文本对象应该仍然像字符串一样运作,图像对象应该仍然像`PIL.Image`一样运作。
|
||||
|
||||
这些类型有三个特定目的:
|
||||
|
||||
- 对类型调用 `to_raw` 应该返回底层对象
|
||||
- 对类型调用 `to_string` 应该将对象作为字符串返回:在`AgentText`的情况下可能是字符串,但在其他情况下可能是对象序列化版本的路径
|
||||
- 在ipython内核中显示它应该正确显示对象
|
||||
|
||||
### AgentText
|
||||
|
||||
[[autodoc]] transformers.tools.agent_types.AgentText
|
||||
|
||||
### AgentImage
|
||||
|
||||
[[autodoc]] transformers.tools.agent_types.AgentImage
|
||||
|
||||
### AgentAudio
|
||||
|
||||
[[autodoc]] transformers.tools.agent_types.AgentAudio
|
125
docs/source/zh/main_classes/callback.md
Normal file
125
docs/source/zh/main_classes/callback.md
Normal file
@ -0,0 +1,125 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Callbacks
|
||||
|
||||
|
||||
Callbacks可以用来自定义PyTorch [Trainer]中训练循环行为的对象(此功能尚未在TensorFlow中实现),该对象可以检查训练循环状态(用于进度报告、在TensorBoard或其他ML平台上记录日志等),并做出决策(例如提前停止)。
|
||||
|
||||
Callbacks是“只读”的代码片段,除了它们返回的[TrainerControl]对象外,它们不能更改训练循环中的任何内容。对于需要更改训练循环的自定义,您应该继承[Trainer]并重载您需要的方法(有关示例,请参见[trainer](trainer))。
|
||||
|
||||
默认情况下,`TrainingArguments.report_to` 设置为"all",然后[Trainer]将使用以下callbacks。
|
||||
|
||||
|
||||
- [`DefaultFlowCallback`],它处理默认的日志记录、保存和评估行为
|
||||
- [`PrinterCallback`] 或 [`ProgressCallback`],用于显示进度和打印日志(如果通过[`TrainingArguments`]停用tqdm,则使用第一个函数;否则使用第二个)。
|
||||
- [`~integrations.TensorBoardCallback`],如果TensorBoard可访问(通过PyTorch版本 >= 1.4 或者 tensorboardX)。
|
||||
- [`~integrations.WandbCallback`],如果安装了[wandb](https://www.wandb.com/)。
|
||||
- [`~integrations.CometCallback`],如果安装了[comet_ml](https://www.comet.ml/site/)。
|
||||
- [`~integrations.MLflowCallback`],如果安装了[mlflow](https://www.mlflow.org/)。
|
||||
- [`~integrations.NeptuneCallback`],如果安装了[neptune](https://neptune.ai/)。
|
||||
- [`~integrations.AzureMLCallback`],如果安装了[azureml-sdk](https://pypi.org/project/azureml-sdk/)。
|
||||
- [`~integrations.CodeCarbonCallback`],如果安装了[codecarbon](https://pypi.org/project/codecarbon/)。
|
||||
- [`~integrations.ClearMLCallback`],如果安装了[clearml](https://github.com/allegroai/clearml)。
|
||||
- [`~integrations.DagsHubCallback`],如果安装了[dagshub](https://dagshub.com/)。
|
||||
- [`~integrations.FlyteCallback`],如果安装了[flyte](https://flyte.org/)。
|
||||
- [`~integrations.DVCLiveCallback`],如果安装了[dvclive](https://dvc.org/doc/dvclive)。
|
||||
|
||||
如果安装了一个软件包,但您不希望使用相关的集成,您可以将 `TrainingArguments.report_to` 更改为仅包含您想要使用的集成的列表(例如 `["azure_ml", "wandb"]`)。
|
||||
|
||||
实现callbacks的主要类是[`TrainerCallback`]。它获取用于实例化[`Trainer`]的[`TrainingArguments`],可以通过[`TrainerState`]访问该Trainer的内部状态,并可以通过[`TrainerControl`]对训练循环执行一些操作。
|
||||
|
||||
|
||||
## 可用的Callbacks
|
||||
|
||||
这里是库里可用[`TrainerCallback`]的列表:
|
||||
|
||||
[[autodoc]] integrations.CometCallback
|
||||
- setup
|
||||
|
||||
[[autodoc]] DefaultFlowCallback
|
||||
|
||||
[[autodoc]] PrinterCallback
|
||||
|
||||
[[autodoc]] ProgressCallback
|
||||
|
||||
[[autodoc]] EarlyStoppingCallback
|
||||
|
||||
[[autodoc]] integrations.TensorBoardCallback
|
||||
|
||||
[[autodoc]] integrations.WandbCallback
|
||||
- setup
|
||||
|
||||
[[autodoc]] integrations.MLflowCallback
|
||||
- setup
|
||||
|
||||
[[autodoc]] integrations.AzureMLCallback
|
||||
|
||||
[[autodoc]] integrations.CodeCarbonCallback
|
||||
|
||||
[[autodoc]] integrations.NeptuneCallback
|
||||
|
||||
[[autodoc]] integrations.ClearMLCallback
|
||||
|
||||
[[autodoc]] integrations.DagsHubCallback
|
||||
|
||||
[[autodoc]] integrations.FlyteCallback
|
||||
|
||||
[[autodoc]] integrations.DVCLiveCallback
|
||||
- setup
|
||||
|
||||
## TrainerCallback
|
||||
|
||||
[[autodoc]] TrainerCallback
|
||||
|
||||
以下是如何使用PyTorch注册自定义callback的示例:
|
||||
|
||||
[`Trainer`]:
|
||||
|
||||
```python
|
||||
class MyCallback(TrainerCallback):
|
||||
"A callback that prints a message at the beginning of training"
|
||||
|
||||
def on_train_begin(self, args, state, control, **kwargs):
|
||||
print("Starting training")
|
||||
|
||||
|
||||
trainer = Trainer(
|
||||
model,
|
||||
args,
|
||||
train_dataset=train_dataset,
|
||||
eval_dataset=eval_dataset,
|
||||
callbacks=[MyCallback], # We can either pass the callback class this way or an instance of it (MyCallback())
|
||||
)
|
||||
```
|
||||
|
||||
注册callback的另一种方式是调用 `trainer.add_callback()`,如下所示:
|
||||
|
||||
|
||||
```python
|
||||
trainer = Trainer(...)
|
||||
trainer.add_callback(MyCallback)
|
||||
# Alternatively, we can pass an instance of the callback class
|
||||
trainer.add_callback(MyCallback())
|
||||
```
|
||||
|
||||
## TrainerState
|
||||
|
||||
[[autodoc]] TrainerState
|
||||
|
||||
## TrainerControl
|
||||
|
||||
[[autodoc]] TrainerControl
|
28
docs/source/zh/main_classes/configuration.md
Normal file
28
docs/source/zh/main_classes/configuration.md
Normal file
@ -0,0 +1,28 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Configuration
|
||||
|
||||
基类[`PretrainedConfig`]实现了从本地文件或目录加载/保存配置的常见方法,或下载库提供的预训练模型配置(从HuggingFace的AWS S3库中下载)。
|
||||
|
||||
每个派生的配置类都实现了特定于模型的属性。所有配置类中共同存在的属性有:`hidden_size`、`num_attention_heads` 和 `num_hidden_layers`。文本模型进一步添加了 `vocab_size`。
|
||||
|
||||
|
||||
## PretrainedConfig
|
||||
|
||||
[[autodoc]] PretrainedConfig
|
||||
- push_to_hub
|
||||
- all
|
65
docs/source/zh/main_classes/data_collator.md
Normal file
65
docs/source/zh/main_classes/data_collator.md
Normal file
@ -0,0 +1,65 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Data Collator
|
||||
|
||||
Data collators是一个对象,通过使用数据集元素列表作为输入来形成一个批次。这些元素与 `train_dataset` 或 `eval_dataset` 的元素类型相同。
|
||||
|
||||
为了能够构建批次,Data collators可能会应用一些预处理(比如填充)。其中一些(比如[`DataCollatorForLanguageModeling`])还会在形成的批次上应用一些随机数据增强(比如随机掩码)。
|
||||
|
||||
在[示例脚本](../examples)或[示例notebooks](../notebooks)中可以找到使用的示例。
|
||||
|
||||
|
||||
## Default data collator
|
||||
|
||||
[[autodoc]] data.data_collator.default_data_collator
|
||||
|
||||
## DefaultDataCollator
|
||||
|
||||
[[autodoc]] data.data_collator.DefaultDataCollator
|
||||
|
||||
## DataCollatorWithPadding
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorWithPadding
|
||||
|
||||
## DataCollatorForTokenClassification
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorForTokenClassification
|
||||
|
||||
## DataCollatorForSeq2Seq
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorForSeq2Seq
|
||||
|
||||
## DataCollatorForLanguageModeling
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorForLanguageModeling
|
||||
- numpy_mask_tokens
|
||||
- tf_mask_tokens
|
||||
- torch_mask_tokens
|
||||
|
||||
## DataCollatorForWholeWordMask
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorForWholeWordMask
|
||||
- numpy_mask_tokens
|
||||
- tf_mask_tokens
|
||||
- torch_mask_tokens
|
||||
|
||||
## DataCollatorForPermutationLanguageModeling
|
||||
|
||||
[[autodoc]] data.data_collator.DataCollatorForPermutationLanguageModeling
|
||||
- numpy_mask_tokens
|
||||
- tf_mask_tokens
|
||||
- torch_mask_tokens
|
39
docs/source/zh/main_classes/feature_extractor.md
Normal file
39
docs/source/zh/main_classes/feature_extractor.md
Normal file
@ -0,0 +1,39 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Feature Extractor
|
||||
|
||||
Feature Extractor负责为音频或视觉模型准备输入特征。这包括从序列中提取特征,例如,对音频文件进行预处理以生成Log-Mel频谱特征,以及从图像中提取特征,例如,裁剪图像文件,同时还包括填充、归一化和转换为NumPy、PyTorch和TensorFlow张量。
|
||||
|
||||
|
||||
## FeatureExtractionMixin
|
||||
|
||||
[[autodoc]] feature_extraction_utils.FeatureExtractionMixin
|
||||
- from_pretrained
|
||||
- save_pretrained
|
||||
|
||||
## SequenceFeatureExtractor
|
||||
|
||||
[[autodoc]] SequenceFeatureExtractor
|
||||
- pad
|
||||
|
||||
## BatchFeature
|
||||
|
||||
[[autodoc]] BatchFeature
|
||||
|
||||
## ImageFeatureExtractionMixin
|
||||
|
||||
[[autodoc]] image_utils.ImageFeatureExtractionMixin
|
34
docs/source/zh/main_classes/image_processor.md
Normal file
34
docs/source/zh/main_classes/image_processor.md
Normal file
@ -0,0 +1,34 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Image Processor
|
||||
|
||||
Image processor负责为视觉模型准备输入特征并后期处理处理它们的输出。这包括诸如调整大小、归一化和转换为PyTorch、TensorFlow、Flax和NumPy张量等转换。它还可能包括特定于模型的后期处理,例如将logits转换为分割掩码。
|
||||
|
||||
|
||||
## ImageProcessingMixin
|
||||
|
||||
[[autodoc]] image_processing_utils.ImageProcessingMixin
|
||||
- from_pretrained
|
||||
- save_pretrained
|
||||
|
||||
## BatchFeature
|
||||
|
||||
[[autodoc]] BatchFeature
|
||||
|
||||
## BaseImageProcessor
|
||||
|
||||
[[autodoc]] image_processing_utils.BaseImageProcessor
|
27
docs/source/zh/main_classes/keras_callbacks.md
Normal file
27
docs/source/zh/main_classes/keras_callbacks.md
Normal file
@ -0,0 +1,27 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Keras callbacks
|
||||
|
||||
在Keras中训练Transformers模型时,有一些库特定的callbacks函数可用于自动执行常见任务:
|
||||
|
||||
## KerasMetricCallback
|
||||
|
||||
[[autodoc]] KerasMetricCallback
|
||||
|
||||
## PushToHubCallback
|
||||
|
||||
[[autodoc]] PushToHubCallback
|
107
docs/source/zh/main_classes/logging.md
Normal file
107
docs/source/zh/main_classes/logging.md
Normal file
@ -0,0 +1,107 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Logging
|
||||
|
||||
🤗 Transformers拥有一个集中式的日志系统,因此您可以轻松设置库输出的日志详细程度。
|
||||
|
||||
当前库的默认日志详细程度为`WARNING`。
|
||||
|
||||
要更改日志详细程度,只需使用其中一个直接的setter。例如,以下是如何将日志详细程度更改为INFO级别的方法:
|
||||
|
||||
```python
|
||||
import transformers
|
||||
|
||||
transformers.logging.set_verbosity_info()
|
||||
```
|
||||
|
||||
您还可以使用环境变量`TRANSFORMERS_VERBOSITY`来覆盖默认的日志详细程度。您可以将其设置为以下级别之一:`debug`、`info`、`warning`、`error`、`critical`。例如:
|
||||
|
||||
```bash
|
||||
TRANSFORMERS_VERBOSITY=error ./myprogram.py
|
||||
```
|
||||
|
||||
此外,通过将环境变量`TRANSFORMERS_NO_ADVISORY_WARNINGS`设置为`true`(如*1*),可以禁用一些`warnings`。这将禁用[`logger.warning_advice`]记录的任何警告。例如:
|
||||
|
||||
```bash
|
||||
TRANSFORMERS_NO_ADVISORY_WARNINGS=1 ./myprogram.py
|
||||
```
|
||||
|
||||
以下是如何在您自己的模块或脚本中使用与库相同的logger的示例:
|
||||
|
||||
```python
|
||||
from transformers.utils import logging
|
||||
|
||||
logging.set_verbosity_info()
|
||||
logger = logging.get_logger("transformers")
|
||||
logger.info("INFO")
|
||||
logger.warning("WARN")
|
||||
```
|
||||
|
||||
|
||||
此日志模块的所有方法都在下面进行了记录,主要的方法包括 [`logging.get_verbosity`] 用于获取logger当前输出日志详细程度的级别和 [`logging.set_verbosity`] 用于将详细程度设置为您选择的级别。按照顺序(从最不详细到最详细),这些级别(及其相应的整数值)为:
|
||||
|
||||
- `transformers.logging.CRITICAL` 或 `transformers.logging.FATAL`(整数值,50):仅报告最关键的errors。
|
||||
- `transformers.logging.ERROR`(整数值,40):仅报告errors。
|
||||
- `transformers.logging.WARNING` 或 `transformers.logging.WARN`(整数值,30):仅报告error和warnings。这是库使用的默认级别。
|
||||
- `transformers.logging.INFO`(整数值,20):报告error、warnings和基本信息。
|
||||
- `transformers.logging.DEBUG`(整数值,10):报告所有信息。
|
||||
|
||||
默认情况下,将在模型下载期间显示`tqdm`进度条。[`logging.disable_progress_bar`] 和 [`logging.enable_progress_bar`] 可用于禁止或启用此行为。
|
||||
|
||||
## `logging` vs `warnings`
|
||||
|
||||
Python有两个经常一起使用的日志系统:如上所述的`logging`,和对特定buckets中的警告进行进一步分类的`warnings`,例如,`FutureWarning`用于输出已经被弃用的功能或路径,`DeprecationWarning`用于指示即将被弃用的内容。
|
||||
|
||||
我们在`transformers`库中同时使用这两个系统。我们利用并调整了`logging`的`captureWarning`方法,以便通过上面的详细程度setters来管理这些警告消息。
|
||||
|
||||
对于库的开发人员,这意味着什么呢?我们应该遵循以下启发法则:
|
||||
- 库的开发人员和依赖于`transformers`的库应优先使用`warnings`
|
||||
- `logging`应该用于在日常项目中经常使用它的用户
|
||||
|
||||
以下是`captureWarnings`方法的参考。
|
||||
|
||||
[[autodoc]] logging.captureWarnings
|
||||
|
||||
## Base setters
|
||||
|
||||
[[autodoc]] logging.set_verbosity_error
|
||||
|
||||
[[autodoc]] logging.set_verbosity_warning
|
||||
|
||||
[[autodoc]] logging.set_verbosity_info
|
||||
|
||||
[[autodoc]] logging.set_verbosity_debug
|
||||
|
||||
## Other functions
|
||||
|
||||
[[autodoc]] logging.get_verbosity
|
||||
|
||||
[[autodoc]] logging.set_verbosity
|
||||
|
||||
[[autodoc]] logging.get_logger
|
||||
|
||||
[[autodoc]] logging.enable_default_handler
|
||||
|
||||
[[autodoc]] logging.disable_default_handler
|
||||
|
||||
[[autodoc]] logging.enable_explicit_format
|
||||
|
||||
[[autodoc]] logging.reset_format
|
||||
|
||||
[[autodoc]] logging.enable_progress_bar
|
||||
|
||||
[[autodoc]] logging.disable_progress_bar
|
@ -119,16 +119,16 @@ model = AutoModel.from_config(config)
|
||||
|
||||
TFPreTrainedModel
|
||||
[[autodoc]] TFPreTrainedModel
|
||||
- push_to_hub
|
||||
- all
|
||||
- push_to_hub
|
||||
- all
|
||||
|
||||
## TFModelUtilsMixin
|
||||
[[autodoc]] modeling_tf_utils.TFModelUtilsMixin
|
||||
|
||||
FlaxPreTrainedModel
|
||||
[[autodoc]] FlaxPreTrainedModel
|
||||
- push_to_hub
|
||||
- all
|
||||
- push_to_hub
|
||||
- all
|
||||
|
||||
## 推送到 Hub
|
||||
[[autodoc]] utils.PushToHubMixin
|
||||
|
50
docs/source/zh/main_classes/onnx.md
Normal file
50
docs/source/zh/main_classes/onnx.md
Normal file
@ -0,0 +1,50 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 导出 🤗 Transformers 模型到 ONNX
|
||||
|
||||
🤗 Transformers提供了一个`transformers.onnx`包,通过利用配置对象,您可以将模型checkpoints转换为ONNX图。
|
||||
|
||||
有关更多详细信息,请参阅导出 🤗 Transformers 模型的[指南](../serialization)。
|
||||
|
||||
## ONNX Configurations
|
||||
|
||||
我们提供了三个抽象类,取决于您希望导出的模型架构类型:
|
||||
|
||||
* 基于编码器的模型继承 [`~onnx.config.OnnxConfig`]
|
||||
* 基于解码器的模型继承 [`~onnx.config.OnnxConfigWithPast`]
|
||||
* 编码器-解码器模型继承 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]
|
||||
|
||||
### OnnxConfig
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX Features
|
||||
|
||||
每个ONNX配置与一组 _特性_ 相关联,使您能够为不同类型的拓扑结构或任务导出模型。
|
||||
|
||||
### FeaturesManager
|
||||
|
||||
[[autodoc]] onnx.features.FeaturesManager
|
||||
|
77
docs/source/zh/main_classes/optimizer_schedules.md
Normal file
77
docs/source/zh/main_classes/optimizer_schedules.md
Normal file
@ -0,0 +1,77 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Optimization
|
||||
|
||||
`.optimization` 模块提供了:
|
||||
|
||||
- 一个带有固定权重衰减的优化器,可用于微调模型
|
||||
- 继承自 `_LRSchedule` 多个调度器:
|
||||
- 一个梯度累积类,用于累积多个批次的梯度
|
||||
|
||||
## AdamW (PyTorch)
|
||||
|
||||
[[autodoc]] AdamW
|
||||
|
||||
## AdaFactor (PyTorch)
|
||||
|
||||
[[autodoc]] Adafactor
|
||||
|
||||
## AdamWeightDecay (TensorFlow)
|
||||
|
||||
[[autodoc]] AdamWeightDecay
|
||||
|
||||
[[autodoc]] create_optimizer
|
||||
|
||||
## Schedules
|
||||
|
||||
### Learning Rate Schedules (Pytorch)
|
||||
|
||||
[[autodoc]] SchedulerType
|
||||
|
||||
[[autodoc]] get_scheduler
|
||||
|
||||
[[autodoc]] get_constant_schedule
|
||||
|
||||
[[autodoc]] get_constant_schedule_with_warmup
|
||||
|
||||
<img alt="" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/warmup_constant_schedule.png"/>
|
||||
|
||||
[[autodoc]] get_cosine_schedule_with_warmup
|
||||
|
||||
<img alt="" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/warmup_cosine_schedule.png"/>
|
||||
|
||||
[[autodoc]] get_cosine_with_hard_restarts_schedule_with_warmup
|
||||
|
||||
<img alt="" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/warmup_cosine_hard_restarts_schedule.png"/>
|
||||
|
||||
[[autodoc]] get_linear_schedule_with_warmup
|
||||
|
||||
<img alt="" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/warmup_linear_schedule.png"/>
|
||||
|
||||
[[autodoc]] get_polynomial_decay_schedule_with_warmup
|
||||
|
||||
[[autodoc]] get_inverse_sqrt_schedule
|
||||
|
||||
### Warmup (TensorFlow)
|
||||
|
||||
[[autodoc]] WarmUp
|
||||
|
||||
## Gradient Strategies
|
||||
|
||||
### GradientAccumulator (TensorFlow)
|
||||
|
||||
[[autodoc]] GradientAccumulator
|
309
docs/source/zh/main_classes/output.md
Normal file
309
docs/source/zh/main_classes/output.md
Normal file
@ -0,0 +1,309 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 模型输出
|
||||
|
||||
所有模型的输出都是 [`~utils.ModelOutput`] 的子类的实例。这些是包含模型返回的所有信息的数据结构,但也可以用作元组或字典。
|
||||
|
||||
让我们看一个例子:
|
||||
|
||||
```python
|
||||
from transformers import BertTokenizer, BertForSequenceClassification
|
||||
import torch
|
||||
|
||||
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
||||
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
|
||||
|
||||
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
|
||||
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
|
||||
outputs = model(**inputs, labels=labels)
|
||||
```
|
||||
|
||||
`outputs` 对象是 [`~modeling_outputs.SequenceClassifierOutput`],如下面该类的文档中所示,它表示它有一个可选的 `loss`,一个 `logits`,一个可选的 `hidden_states` 和一个可选的 `attentions` 属性。在这里,我们有 `loss`,因为我们传递了 `labels`,但我们没有 `hidden_states` 和 `attentions`,因为我们没有传递 `output_hidden_states=True` 或 `output_attentions=True`。
|
||||
|
||||
<Tip>
|
||||
|
||||
当传递 `output_hidden_states=True` 时,您可能希望 `outputs.hidden_states[-1]` 与 `outputs.last_hidden_states` 完全匹配。然而,这并不总是成立。一些模型在返回最后的 hidden state时对其应用归一化或其他后续处理。
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
您可以像往常一样访问每个属性,如果模型未返回该属性,您将得到 `None`。在这里,例如,`outputs.loss` 是模型计算的损失,而 `outputs.attentions` 是 `None`。
|
||||
|
||||
当将我们的 `outputs` 对象视为元组时,它仅考虑那些没有 `None` 值的属性。例如这里它有两个元素,`loss` 和 `logits`,所以
|
||||
|
||||
```python
|
||||
outputs[:2]
|
||||
```
|
||||
|
||||
将返回元组 `(outputs.loss, outputs.logits)`。
|
||||
|
||||
将我们的 `outputs` 对象视为字典时,它仅考虑那些没有 `None` 值的属性。例如在这里它有两个键,分别是 `loss` 和 `logits`。
|
||||
|
||||
我们在这里记录了被多个类型模型使用的通用模型输出。特定输出类型在其相应的模型页面上有文档。
|
||||
|
||||
## ModelOutput
|
||||
|
||||
[[autodoc]] utils.ModelOutput
|
||||
- to_tuple
|
||||
|
||||
## BaseModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.BaseModelOutput
|
||||
|
||||
## BaseModelOutputWithPooling
|
||||
|
||||
[[autodoc]] modeling_outputs.BaseModelOutputWithPooling
|
||||
|
||||
## BaseModelOutputWithCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_outputs.BaseModelOutputWithCrossAttentions
|
||||
|
||||
## BaseModelOutputWithPoolingAndCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
|
||||
|
||||
## BaseModelOutputWithPast
|
||||
|
||||
[[autodoc]] modeling_outputs.BaseModelOutputWithPast
|
||||
|
||||
## BaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_outputs.BaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
## Seq2SeqModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqModelOutput
|
||||
|
||||
## CausalLMOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.CausalLMOutput
|
||||
|
||||
## CausalLMOutputWithCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_outputs.CausalLMOutputWithCrossAttentions
|
||||
|
||||
## CausalLMOutputWithPast
|
||||
|
||||
[[autodoc]] modeling_outputs.CausalLMOutputWithPast
|
||||
|
||||
## MaskedLMOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.MaskedLMOutput
|
||||
|
||||
## Seq2SeqLMOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqLMOutput
|
||||
|
||||
## NextSentencePredictorOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.NextSentencePredictorOutput
|
||||
|
||||
## SequenceClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.SequenceClassifierOutput
|
||||
|
||||
## Seq2SeqSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqSequenceClassifierOutput
|
||||
|
||||
## MultipleChoiceModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.MultipleChoiceModelOutput
|
||||
|
||||
## TokenClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.TokenClassifierOutput
|
||||
|
||||
## QuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.QuestionAnsweringModelOutput
|
||||
|
||||
## Seq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
## Seq2SeqSpectrogramOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqSpectrogramOutput
|
||||
|
||||
## SemanticSegmenterOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.SemanticSegmenterOutput
|
||||
|
||||
## ImageClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.ImageClassifierOutput
|
||||
|
||||
## ImageClassifierOutputWithNoAttention
|
||||
|
||||
[[autodoc]] modeling_outputs.ImageClassifierOutputWithNoAttention
|
||||
|
||||
## DepthEstimatorOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.DepthEstimatorOutput
|
||||
|
||||
## Wav2Vec2BaseModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Wav2Vec2BaseModelOutput
|
||||
|
||||
## XVectorOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.XVectorOutput
|
||||
|
||||
## Seq2SeqTSModelOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqTSModelOutput
|
||||
|
||||
## Seq2SeqTSPredictionOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.Seq2SeqTSPredictionOutput
|
||||
|
||||
## SampleTSPredictionOutput
|
||||
|
||||
[[autodoc]] modeling_outputs.SampleTSPredictionOutput
|
||||
|
||||
## TFBaseModelOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFBaseModelOutput
|
||||
|
||||
## TFBaseModelOutputWithPooling
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPooling
|
||||
|
||||
## TFBaseModelOutputWithPoolingAndCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions
|
||||
|
||||
## TFBaseModelOutputWithPast
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPast
|
||||
|
||||
## TFBaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
## TFSeq2SeqModelOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFSeq2SeqModelOutput
|
||||
|
||||
## TFCausalLMOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFCausalLMOutput
|
||||
|
||||
## TFCausalLMOutputWithCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions
|
||||
|
||||
## TFCausalLMOutputWithPast
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFCausalLMOutputWithPast
|
||||
|
||||
## TFMaskedLMOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFMaskedLMOutput
|
||||
|
||||
## TFSeq2SeqLMOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFSeq2SeqLMOutput
|
||||
|
||||
## TFNextSentencePredictorOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFNextSentencePredictorOutput
|
||||
|
||||
## TFSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFSequenceClassifierOutput
|
||||
|
||||
## TFSeq2SeqSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput
|
||||
|
||||
## TFMultipleChoiceModelOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFMultipleChoiceModelOutput
|
||||
|
||||
## TFTokenClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFTokenClassifierOutput
|
||||
|
||||
## TFQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFQuestionAnsweringModelOutput
|
||||
|
||||
## TFSeq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
## FlaxBaseModelOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutput
|
||||
|
||||
## FlaxBaseModelOutputWithPast
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutputWithPast
|
||||
|
||||
## FlaxBaseModelOutputWithPooling
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutputWithPooling
|
||||
|
||||
## FlaxBaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
## FlaxSeq2SeqModelOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqModelOutput
|
||||
|
||||
## FlaxCausalLMOutputWithCrossAttentions
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions
|
||||
|
||||
## FlaxMaskedLMOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxMaskedLMOutput
|
||||
|
||||
## FlaxSeq2SeqLMOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqLMOutput
|
||||
|
||||
## FlaxNextSentencePredictorOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxNextSentencePredictorOutput
|
||||
|
||||
## FlaxSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxSequenceClassifierOutput
|
||||
|
||||
## FlaxSeq2SeqSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqSequenceClassifierOutput
|
||||
|
||||
## FlaxMultipleChoiceModelOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxMultipleChoiceModelOutput
|
||||
|
||||
## FlaxTokenClassifierOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxTokenClassifierOutput
|
||||
|
||||
## FlaxQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxQuestionAnsweringModelOutput
|
||||
|
||||
## FlaxSeq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqQuestionAnsweringModelOutput
|
474
docs/source/zh/main_classes/pipelines.md
Normal file
474
docs/source/zh/main_classes/pipelines.md
Normal file
@ -0,0 +1,474 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Pipelines
|
||||
|
||||
pipelines是使用模型进行推理的一种简单方法。这些pipelines是抽象了库中大部分复杂代码的对象,提供了一个专用于多个任务的简单API,包括专名识别、掩码语言建模、情感分析、特征提取和问答等。请参阅[任务摘要](../task_summary)以获取使用示例。
|
||||
|
||||
有两种pipelines抽象类需要注意:
|
||||
|
||||
- [`pipeline`],它是封装所有其他pipelines的最强大的对象。
|
||||
- 针对特定任务pipelines,适用于[音频](#audio)、[计算机视觉](#computer-vision)、[自然语言处理](#natural-language-processing)和[多模态](#multimodal)任务。
|
||||
|
||||
## pipeline抽象类
|
||||
|
||||
*pipeline*抽象类是对所有其他可用pipeline的封装。它可以像任何其他pipeline一样实例化,但进一步提供额外的便利性。
|
||||
|
||||
简单调用一个项目:
|
||||
|
||||
|
||||
```python
|
||||
>>> pipe = pipeline("text-classification")
|
||||
>>> pipe("This restaurant is awesome")
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
||||
```
|
||||
|
||||
如果您想使用 [hub](https://huggingface.co) 上的特定模型,可以忽略任务,如果hub上的模型已经定义了该任务:
|
||||
|
||||
```python
|
||||
>>> pipe = pipeline(model="roberta-large-mnli")
|
||||
>>> pipe("This restaurant is awesome")
|
||||
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
|
||||
```
|
||||
|
||||
要在多个项目上调用pipeline,可以使用*列表*调用它。
|
||||
|
||||
```python
|
||||
>>> pipe = pipeline("text-classification")
|
||||
>>> pipe(["This restaurant is awesome", "This restaurant is awful"])
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
|
||||
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
|
||||
```
|
||||
|
||||
为了遍历整个数据集,建议直接使用 `dataset`。这意味着您不需要一次性分配整个数据集,也不需要自己进行批处理。这应该与GPU上的自定义循环一样快。如果不是,请随时提出issue。
|
||||
|
||||
```python
|
||||
import datasets
|
||||
from transformers import pipeline
|
||||
from transformers.pipelines.pt_utils import KeyDataset
|
||||
from tqdm.auto import tqdm
|
||||
|
||||
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
|
||||
dataset = datasets.load_dataset("superb", name="asr", split="test")
|
||||
|
||||
# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
|
||||
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
|
||||
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
|
||||
print(out)
|
||||
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
||||
# {"text": ....}
|
||||
# ....
|
||||
```
|
||||
|
||||
为了方便使用,也可以使用生成器:
|
||||
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
pipe = pipeline("text-classification")
|
||||
|
||||
|
||||
def data():
|
||||
while True:
|
||||
# This could come from a dataset, a database, a queue or HTTP request
|
||||
# in a server
|
||||
# Caveat: because this is iterative, you cannot use `num_workers > 1` variable
|
||||
# to use multiple threads to preprocess data. You can still have 1 thread that
|
||||
# does the preprocessing while the main runs the big inference
|
||||
yield "This is a test"
|
||||
|
||||
|
||||
for out in pipe(data()):
|
||||
print(out)
|
||||
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
||||
# {"text": ....}
|
||||
# ....
|
||||
```
|
||||
|
||||
[[autodoc]] pipeline
|
||||
|
||||
## Pipeline batching
|
||||
|
||||
所有pipeline都可以使用批处理。这将在pipeline使用其流处理功能时起作用(即传递列表或 `Dataset` 或 `generator` 时)。
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
from transformers.pipelines.pt_utils import KeyDataset
|
||||
import datasets
|
||||
|
||||
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
|
||||
pipe = pipeline("text-classification", device=0)
|
||||
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
|
||||
print(out)
|
||||
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
||||
# Exactly the same output as before, but the content are passed
|
||||
# as batches to the model
|
||||
```
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
然而,这并不自动意味着性能提升。它可能是一个10倍的加速或5倍的减速,具体取决于硬件、数据和实际使用的模型。
|
||||
|
||||
主要是加速的示例:
|
||||
|
||||
</Tip>
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
from torch.utils.data import Dataset
|
||||
from tqdm.auto import tqdm
|
||||
|
||||
pipe = pipeline("text-classification", device=0)
|
||||
|
||||
|
||||
class MyDataset(Dataset):
|
||||
def __len__(self):
|
||||
return 5000
|
||||
|
||||
def __getitem__(self, i):
|
||||
return "This is a test"
|
||||
|
||||
|
||||
dataset = MyDataset()
|
||||
|
||||
for batch_size in [1, 8, 64, 256]:
|
||||
print("-" * 30)
|
||||
print(f"Streaming batch_size={batch_size}")
|
||||
for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
|
||||
pass
|
||||
```
|
||||
|
||||
```
|
||||
# On GTX 970
|
||||
------------------------------
|
||||
Streaming no batching
|
||||
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=8
|
||||
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=64
|
||||
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=256
|
||||
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s]
|
||||
(diminishing returns, saturated the GPU)
|
||||
```
|
||||
|
||||
主要是减速的示例:
|
||||
|
||||
```python
|
||||
class MyDataset(Dataset):
|
||||
def __len__(self):
|
||||
return 5000
|
||||
|
||||
def __getitem__(self, i):
|
||||
if i % 64 == 0:
|
||||
n = 100
|
||||
else:
|
||||
n = 1
|
||||
return "This is a test" * n
|
||||
```
|
||||
|
||||
与其他句子相比,这是一个非常长的句子。在这种情况下,**整个**批次将需要400个tokens的长度,因此整个批次将是 [64, 400] 而不是 [64, 4],从而导致较大的减速。更糟糕的是,在更大的批次上,程序会崩溃。
|
||||
|
||||
```
|
||||
------------------------------
|
||||
Streaming no batching
|
||||
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=8
|
||||
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=64
|
||||
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=256
|
||||
0%| | 0/1000 [00:00<?, ?it/s]
|
||||
Traceback (most recent call last):
|
||||
File "/home/nicolas/src/transformers/test.py", line 42, in <module>
|
||||
for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
|
||||
....
|
||||
q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head)
|
||||
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
|
||||
```
|
||||
|
||||
对于这个问题,没有好的(通用)解决方案,效果可能因您的用例而异。经验法则如下:
|
||||
|
||||
对于用户,一个经验法则是:
|
||||
|
||||
- **使用硬件测量负载性能。测量、测量、再测量。真实的数字是唯一的方法。**
|
||||
- 如果受到延迟的限制(进行推理的实时产品),不要进行批处理。
|
||||
- 如果使用CPU,不要进行批处理。
|
||||
- 如果您在GPU上处理的是吞吐量(您希望在大量静态数据上运行模型),则:
|
||||
- 如果对序列长度的大小没有概念("自然"数据),默认情况下不要进行批处理,进行测试并尝试逐渐添加,添加OOM检查以在失败时恢复(如果您不能控制序列长度,它将在某些时候失败)。
|
||||
- 如果您的序列长度非常规律,那么批处理更有可能非常有趣,进行测试并推动它,直到出现OOM。
|
||||
- GPU越大,批处理越有可能变得更有趣
|
||||
- 一旦启用批处理,确保能够很好地处理OOM。
|
||||
|
||||
## Pipeline chunk batching
|
||||
|
||||
`zero-shot-classification` 和 `question-answering` 在某种意义上稍微特殊,因为单个输入可能会导致模型的多次前向传递。在正常情况下,这将导致 `batch_size` 参数的问题。
|
||||
|
||||
为了规避这个问题,这两个pipeline都有点特殊,它们是 `ChunkPipeline` 而不是常规的 `Pipeline`。简而言之:
|
||||
|
||||
|
||||
```python
|
||||
preprocessed = pipe.preprocess(inputs)
|
||||
model_outputs = pipe.forward(preprocessed)
|
||||
outputs = pipe.postprocess(model_outputs)
|
||||
```
|
||||
|
||||
现在变成:
|
||||
|
||||
|
||||
```python
|
||||
all_model_outputs = []
|
||||
for preprocessed in pipe.preprocess(inputs):
|
||||
model_outputs = pipe.forward(preprocessed)
|
||||
all_model_outputs.append(model_outputs)
|
||||
outputs = pipe.postprocess(all_model_outputs)
|
||||
```
|
||||
|
||||
这对您的代码应该是非常直观的,因为pipeline的使用方式是相同的。
|
||||
|
||||
这是一个简化的视图,因为Pipeline可以自动处理批次!这意味着您不必担心您的输入实际上会触发多少次前向传递,您可以独立于输入优化 `batch_size`。前面部分的注意事项仍然适用。
|
||||
|
||||
## Pipeline自定义
|
||||
|
||||
如果您想要重载特定的pipeline。
|
||||
|
||||
请随时为您手头的任务创建一个issue,Pipeline的目标是易于使用并支持大多数情况,因此 `transformers` 可能支持您的用例。
|
||||
|
||||
如果您想简单地尝试一下,可以:
|
||||
|
||||
- 继承您选择的pipeline
|
||||
|
||||
```python
|
||||
class MyPipeline(TextClassificationPipeline):
|
||||
def postprocess():
|
||||
# Your code goes here
|
||||
scores = scores * 100
|
||||
# And here
|
||||
|
||||
|
||||
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
|
||||
# or if you use *pipeline* function, then:
|
||||
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
|
||||
```
|
||||
|
||||
这样就可以让您编写所有想要的自定义代码。
|
||||
|
||||
|
||||
## 实现一个pipeline
|
||||
|
||||
[实现一个新的pipeline](../add_new_pipeline)
|
||||
|
||||
## 音频
|
||||
|
||||
可用于音频任务的pipeline包括以下几种。
|
||||
|
||||
### AudioClassificationPipeline
|
||||
|
||||
[[autodoc]] AudioClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### AutomaticSpeechRecognitionPipeline
|
||||
|
||||
[[autodoc]] AutomaticSpeechRecognitionPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### TextToAudioPipeline
|
||||
|
||||
[[autodoc]] TextToAudioPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
|
||||
### ZeroShotAudioClassificationPipeline
|
||||
|
||||
[[autodoc]] ZeroShotAudioClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
## 计算机视觉
|
||||
|
||||
可用于计算机视觉任务的pipeline包括以下几种。
|
||||
|
||||
### DepthEstimationPipeline
|
||||
[[autodoc]] DepthEstimationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ImageClassificationPipeline
|
||||
|
||||
[[autodoc]] ImageClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ImageSegmentationPipeline
|
||||
|
||||
[[autodoc]] ImageSegmentationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ImageToImagePipeline
|
||||
|
||||
[[autodoc]] ImageToImagePipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ObjectDetectionPipeline
|
||||
|
||||
[[autodoc]] ObjectDetectionPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### VideoClassificationPipeline
|
||||
|
||||
[[autodoc]] VideoClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ZeroShotImageClassificationPipeline
|
||||
|
||||
[[autodoc]] ZeroShotImageClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ZeroShotObjectDetectionPipeline
|
||||
|
||||
[[autodoc]] ZeroShotObjectDetectionPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
## 自然语言处理
|
||||
|
||||
可用于自然语言处理任务的pipeline包括以下几种。
|
||||
|
||||
### ConversationalPipeline
|
||||
|
||||
[[autodoc]] Conversation
|
||||
|
||||
[[autodoc]] ConversationalPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### FillMaskPipeline
|
||||
|
||||
[[autodoc]] FillMaskPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### NerPipeline
|
||||
|
||||
[[autodoc]] NerPipeline
|
||||
|
||||
See [`TokenClassificationPipeline`] for all details.
|
||||
|
||||
### QuestionAnsweringPipeline
|
||||
|
||||
[[autodoc]] QuestionAnsweringPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### SummarizationPipeline
|
||||
|
||||
[[autodoc]] SummarizationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### TableQuestionAnsweringPipeline
|
||||
|
||||
[[autodoc]] TableQuestionAnsweringPipeline
|
||||
- __call__
|
||||
|
||||
### TextClassificationPipeline
|
||||
|
||||
[[autodoc]] TextClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### TextGenerationPipeline
|
||||
|
||||
[[autodoc]] TextGenerationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### Text2TextGenerationPipeline
|
||||
|
||||
[[autodoc]] Text2TextGenerationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### TokenClassificationPipeline
|
||||
|
||||
[[autodoc]] TokenClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### TranslationPipeline
|
||||
|
||||
[[autodoc]] TranslationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ZeroShotClassificationPipeline
|
||||
|
||||
[[autodoc]] ZeroShotClassificationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
## 多模态
|
||||
|
||||
可用于多模态任务的pipeline包括以下几种。
|
||||
|
||||
### DocumentQuestionAnsweringPipeline
|
||||
|
||||
[[autodoc]] DocumentQuestionAnsweringPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### FeatureExtractionPipeline
|
||||
|
||||
[[autodoc]] FeatureExtractionPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### ImageToTextPipeline
|
||||
|
||||
[[autodoc]] ImageToTextPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### MaskGenerationPipeline
|
||||
|
||||
[[autodoc]] MaskGenerationPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
### VisualQuestionAnsweringPipeline
|
||||
|
||||
[[autodoc]] VisualQuestionAnsweringPipeline
|
||||
- __call__
|
||||
- all
|
||||
|
||||
## Parent class: `Pipeline`
|
||||
|
||||
[[autodoc]] Pipeline
|
146
docs/source/zh/main_classes/processors.md
Normal file
146
docs/source/zh/main_classes/processors.md
Normal file
@ -0,0 +1,146 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Processors
|
||||
|
||||
在 Transformers 库中,processors可以有两种不同的含义:
|
||||
- 为多模态模型,例如[Wav2Vec2](../model_doc/wav2vec2)(语音和文本)或[CLIP](../model_doc/clip)(文本和视觉)预处理输入的对象
|
||||
- 在库的旧版本中用于预处理GLUE或SQUAD数据的已弃用对象。
|
||||
|
||||
## 多模态processors
|
||||
|
||||
任何多模态模型都需要一个对象来编码或解码将多个模态(包括文本、视觉和音频)组合在一起的数据。这由称为processors的对象处理,这些processors将两个或多个处理对象组合在一起,例如tokenizers(用于文本模态),image processors(用于视觉)和feature extractors(用于音频)。
|
||||
|
||||
这些processors继承自以下实现保存和加载功能的基类:
|
||||
|
||||
|
||||
[[autodoc]] ProcessorMixin
|
||||
|
||||
## 已弃用的processors
|
||||
|
||||
所有processor都遵循与 [`~data.processors.utils.DataProcessor`] 相同的架构。processor返回一个 [`~data.processors.utils.InputExample`] 列表。这些 [`~data.processors.utils.InputExample`] 可以转换为 [`~data.processors.utils.InputFeatures`] 以供输送到模型。
|
||||
|
||||
[[autodoc]] data.processors.utils.DataProcessor
|
||||
|
||||
[[autodoc]] data.processors.utils.InputExample
|
||||
|
||||
[[autodoc]] data.processors.utils.InputFeatures
|
||||
|
||||
## GLUE
|
||||
|
||||
[General Language Understanding Evaluation (GLUE)](https://gluebenchmark.com/) 是一个基准测试,评估模型在各种现有的自然语言理解任务上的性能。它与论文 [GLUE: A multi-task benchmark and analysis platform for natural language understanding](https://openreview.net/pdf?id=rJ4km2R5t7) 一同发布。
|
||||
|
||||
该库为以下任务提供了总共10个processor:MRPC、MNLI、MNLI(mismatched)、CoLA、SST2、STSB、QQP、QNLI、RTE 和 WNLI。
|
||||
|
||||
这些processor是:
|
||||
|
||||
- [`~data.processors.utils.MrpcProcessor`]
|
||||
- [`~data.processors.utils.MnliProcessor`]
|
||||
- [`~data.processors.utils.MnliMismatchedProcessor`]
|
||||
- [`~data.processors.utils.Sst2Processor`]
|
||||
- [`~data.processors.utils.StsbProcessor`]
|
||||
- [`~data.processors.utils.QqpProcessor`]
|
||||
- [`~data.processors.utils.QnliProcessor`]
|
||||
- [`~data.processors.utils.RteProcessor`]
|
||||
- [`~data.processors.utils.WnliProcessor`]
|
||||
|
||||
此外,还可以使用以下方法从数据文件加载值并将其转换为 [`~data.processors.utils.InputExample`] 列表。
|
||||
|
||||
[[autodoc]] data.processors.glue.glue_convert_examples_to_features
|
||||
|
||||
|
||||
## XNLI
|
||||
|
||||
[跨语言NLI语料库(XNLI)](https://www.nyu.edu/projects/bowman/xnli/) 是一个评估跨语言文本表示质量的基准测试。XNLI是一个基于[*MultiNLI*](http://www.nyu.edu/projects/bowman/multinli/)的众包数据集:”文本对“被标记为包含15种不同语言(包括英语等高资源语言和斯瓦希里语等低资源语言)的文本蕴涵注释。
|
||||
|
||||
它与论文 [XNLI: Evaluating Cross-lingual Sentence Representations](https://arxiv.org/abs/1809.05053) 一同发布。
|
||||
|
||||
该库提供了加载XNLI数据的processor:
|
||||
|
||||
- [`~data.processors.utils.XnliProcessor`]
|
||||
|
||||
请注意,由于测试集上有“gold”标签,因此评估是在测试集上进行的。
|
||||
|
||||
使用这些processor的示例在 [run_xnli.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_xnli.py) 脚本中提供。
|
||||
|
||||
|
||||
## SQuAD
|
||||
|
||||
[斯坦福问答数据集(SQuAD)](https://rajpurkar.github.io/SQuAD-explorer//) 是一个评估模型在问答上性能的基准测试。有两个版本,v1.1 和 v2.0。第一个版本(v1.1)与论文 [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250) 一同发布。第二个版本(v2.0)与论文 [Know What You Don't Know: Unanswerable Questions for SQuAD](https://arxiv.org/abs/1806.03822) 一同发布。
|
||||
|
||||
该库为两个版本各自提供了一个processor:
|
||||
|
||||
### Processors
|
||||
|
||||
这两个processor是:
|
||||
|
||||
- [`~data.processors.utils.SquadV1Processor`]
|
||||
- [`~data.processors.utils.SquadV2Processor`]
|
||||
|
||||
它们都继承自抽象类 [`~data.processors.utils.SquadProcessor`]。
|
||||
|
||||
[[autodoc]] data.processors.squad.SquadProcessor
|
||||
- all
|
||||
|
||||
此外,可以使用以下方法将 SQuAD 示例转换为可用作模型输入的 [`~data.processors.utils.SquadFeatures`]。
|
||||
|
||||
[[autodoc]] data.processors.squad.squad_convert_examples_to_features
|
||||
|
||||
|
||||
这些processor以及前面提到的方法可以与包含数据的文件以及tensorflow_datasets包一起使用。下面给出了示例。
|
||||
|
||||
|
||||
### Example使用
|
||||
|
||||
以下是使用processor以及使用数据文件的转换方法的示例:
|
||||
|
||||
```python
|
||||
# Loading a V2 processor
|
||||
processor = SquadV2Processor()
|
||||
examples = processor.get_dev_examples(squad_v2_data_dir)
|
||||
|
||||
# Loading a V1 processor
|
||||
processor = SquadV1Processor()
|
||||
examples = processor.get_dev_examples(squad_v1_data_dir)
|
||||
|
||||
features = squad_convert_examples_to_features(
|
||||
examples=examples,
|
||||
tokenizer=tokenizer,
|
||||
max_seq_length=max_seq_length,
|
||||
doc_stride=args.doc_stride,
|
||||
max_query_length=max_query_length,
|
||||
is_training=not evaluate,
|
||||
)
|
||||
```
|
||||
|
||||
使用 *tensorflow_datasets* 就像使用数据文件一样简单:
|
||||
|
||||
```python
|
||||
# tensorflow_datasets only handle Squad V1.
|
||||
tfds_examples = tfds.load("squad")
|
||||
examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate)
|
||||
|
||||
features = squad_convert_examples_to_features(
|
||||
examples=examples,
|
||||
tokenizer=tokenizer,
|
||||
max_seq_length=max_seq_length,
|
||||
doc_stride=args.doc_stride,
|
||||
max_query_length=max_query_length,
|
||||
is_training=not evaluate,
|
||||
)
|
||||
```
|
||||
|
||||
另一个使用这些processor的示例在 [run_squad.py](https://github.com/huggingface/transformers/tree/main/examples/legacy/question-answering/run_squad.py) 脚本中提供。
|
572
docs/source/zh/main_classes/quantization.md
Normal file
572
docs/source/zh/main_classes/quantization.md
Normal file
@ -0,0 +1,572 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 量化 🤗 Transformers 模型
|
||||
|
||||
## AWQ集成
|
||||
|
||||
AWQ方法已经在[*AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration*论文](https://arxiv.org/abs/2306.00978)中引入。通过AWQ,您可以以4位精度运行模型,同时保留其原始性能(即没有性能降级),并具有比下面介绍的其他量化方法更出色的吞吐量 - 达到与纯`float16`推理相似的吞吐量。
|
||||
|
||||
我们现在支持使用任何AWQ模型进行推理,这意味着任何人都可以加载和使用在Hub上推送或本地保存的AWQ权重。请注意,使用AWQ需要访问NVIDIA GPU。目前不支持CPU推理。
|
||||
|
||||
|
||||
### 量化一个模型
|
||||
|
||||
我们建议用户查看生态系统中不同的现有工具,以使用AWQ算法对其模型进行量化,例如:
|
||||
|
||||
- [`llm-awq`](https://github.com/mit-han-lab/llm-awq),来自MIT Han Lab
|
||||
- [`autoawq`](https://github.com/casper-hansen/AutoAWQ),来自[`casper-hansen`](https://github.com/casper-hansen)
|
||||
- Intel neural compressor,来自Intel - 通过[`optimum-intel`](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)使用
|
||||
|
||||
生态系统中可能存在许多其他工具,请随时提出PR将它们添加到列表中。
|
||||
目前与🤗 Transformers的集成仅适用于使用`autoawq`和`llm-awq`量化后的模型。大多数使用`auto-awq`量化的模型可以在🤗 Hub的[`TheBloke`](https://huggingface.co/TheBloke)命名空间下找到,要使用`llm-awq`对模型进行量化,请参阅[`llm-awq`](https://github.com/mit-han-lab/llm-awq/)的示例文件夹中的[`convert_to_hf.py`](https://github.com/mit-han-lab/llm-awq/blob/main/examples/convert_to_hf.py)脚本。
|
||||
|
||||
|
||||
### 加载一个量化的模型
|
||||
|
||||
您可以使用`from_pretrained`方法从Hub加载一个量化模型。通过检查模型配置文件(`configuration.json`)中是否存在`quantization_config`属性,来进行确认推送的权重是量化的。您可以通过检查字段`quantization_config.quant_method`来确认模型是否以AWQ格式进行量化,该字段应该设置为`"awq"`。请注意,为了性能原因,默认情况下加载模型将设置其他权重为`float16`。如果您想更改这种设置,可以通过将`torch_dtype`参数设置为`torch.float32`或`torch.bfloat16`。在下面的部分中,您可以找到一些示例片段和notebook。
|
||||
|
||||
|
||||
## 示例使用
|
||||
|
||||
首先,您需要安装[`autoawq`](https://github.com/casper-hansen/AutoAWQ)库
|
||||
|
||||
```bash
|
||||
pip install autoawq
|
||||
```
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "TheBloke/zephyr-7B-alpha-AWQ"
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda:0")
|
||||
```
|
||||
|
||||
如果您首先将模型加载到CPU上,请确保在使用之前将其移动到GPU设备上。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "TheBloke/zephyr-7B-alpha-AWQ"
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id).to("cuda:0")
|
||||
```
|
||||
|
||||
### 结合 AWQ 和 Flash Attention
|
||||
|
||||
您可以将AWQ量化与Flash Attention结合起来,得到一个既被量化又更快速的模型。只需使用`from_pretrained`加载模型,并传递`use_flash_attention_2=True`参数。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("TheBloke/zephyr-7B-alpha-AWQ", use_flash_attention_2=True, device_map="cuda:0")
|
||||
```
|
||||
|
||||
### 基准测试
|
||||
|
||||
我们使用[`optimum-benchmark`](https://github.com/huggingface/optimum-benchmark)库进行了一些速度、吞吐量和延迟基准测试。
|
||||
|
||||
请注意,在编写本文档部分时,可用的量化方法包括:`awq`、`gptq`和`bitsandbytes`。
|
||||
|
||||
基准测试在一台NVIDIA-A100实例上运行,使用[`TheBloke/Mistral-7B-v0.1-AWQ`](https://huggingface.co/TheBloke/Mistral-7B-v0.1-AWQ)作为AWQ模型,[`TheBloke/Mistral-7B-v0.1-GPTQ`](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ)作为GPTQ模型。我们还将其与`bitsandbytes`量化模型和`float16`模型进行了对比。以下是一些结果示例:
|
||||
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/quantization/forward_memory_plot.png">
|
||||
</div>
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/quantization/generate_memory_plot.png">
|
||||
</div>
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/quantization/generate_throughput_plot.png">
|
||||
</div>
|
||||
|
||||
<div style="text-align: center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/quantization/forward_latency_plot.png">
|
||||
</div>
|
||||
|
||||
你可以在[此链接](https://github.com/huggingface/optimum-benchmark/tree/main/examples/running-mistrals)中找到完整的结果以及包版本。
|
||||
|
||||
从结果来看,AWQ量化方法是推理、文本生成中最快的量化方法,并且在文本生成的峰值内存方面属于最低。然而,对于每批数据,AWQ似乎有最大的前向延迟。
|
||||
|
||||
|
||||
### Google colab 演示
|
||||
|
||||
查看如何在[Google Colab演示](https://colab.research.google.com/drive/1HzZH89yAXJaZgwJDhQj9LqSBux932BvY)中使用此集成!
|
||||
|
||||
|
||||
### AwqConfig
|
||||
|
||||
[[autodoc]] AwqConfig
|
||||
|
||||
## `AutoGPTQ` 集成
|
||||
|
||||
🤗 Transformers已经整合了`optimum` API,用于对语言模型执行GPTQ量化。您可以以8、4、3甚至2位加载和量化您的模型,而性能无明显下降,并且推理速度更快!这受到大多数GPU硬件的支持。
|
||||
|
||||
要了解更多关于量化模型的信息,请查看:
|
||||
- [GPTQ](https://arxiv.org/pdf/2210.17323.pdf)论文
|
||||
- `optimum`关于GPTQ量化的[指南](https://huggingface.co/docs/optimum/llm_quantization/usage_guides/quantization)
|
||||
- 用作后端的[`AutoGPTQ`](https://github.com/PanQiWei/AutoGPTQ)库
|
||||
|
||||
|
||||
### 要求
|
||||
|
||||
为了运行下面的代码,您需要安装:
|
||||
|
||||
- 安装最新版本的 `AutoGPTQ` 库
|
||||
`pip install auto-gptq`
|
||||
|
||||
- 从源代码安装最新版本的`optimum`
|
||||
`pip install git+https://github.com/huggingface/optimum.git`
|
||||
|
||||
- 从源代码安装最新版本的`transformers`
|
||||
`pip install git+https://github.com/huggingface/transformers.git`
|
||||
|
||||
- 安装最新版本的`accelerate`库:
|
||||
`pip install --upgrade accelerate`
|
||||
|
||||
请注意,目前GPTQ集成仅支持文本模型,对于视觉、语音或多模态模型可能会遇到预期以外结果。
|
||||
|
||||
### 加载和量化模型
|
||||
|
||||
GPTQ是一种在使用量化模型之前需要进行权重校准的量化方法。如果您想从头开始对transformers模型进行量化,生成量化模型可能需要一些时间(在Google Colab上对`facebook/opt-350m`模型量化约为5分钟)。
|
||||
|
||||
因此,有两种不同的情况下您可能想使用GPTQ量化模型。第一种情况是加载已经由其他用户在Hub上量化的模型,第二种情况是从头开始对您的模型进行量化并保存或推送到Hub,以便其他用户也可以使用它。
|
||||
|
||||
|
||||
#### GPTQ 配置
|
||||
|
||||
为了加载和量化一个模型,您需要创建一个[`GPTQConfig`]。您需要传递`bits`的数量,一个用于校准量化的`dataset`,以及模型的`tokenizer`以准备数据集。
|
||||
|
||||
```python
|
||||
model_id = "facebook/opt-125m"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
gptq_config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer)
|
||||
```
|
||||
|
||||
请注意,您可以将自己的数据集以字符串列表形式传递到模型。然而,强烈建议您使用GPTQ论文中提供的数据集。
|
||||
|
||||
|
||||
```python
|
||||
dataset = ["auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."]
|
||||
quantization = GPTQConfig(bits=4, dataset = dataset, tokenizer=tokenizer)
|
||||
```
|
||||
|
||||
#### 量化
|
||||
|
||||
您可以通过使用`from_pretrained`并设置`quantization_config`来对模型进行量化。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=gptq_config)
|
||||
|
||||
```
|
||||
|
||||
请注意,您需要一个GPU来量化模型。我们将模型放在cpu中,并将模块来回移动到gpu中,以便对其进行量化。
|
||||
|
||||
如果您想在使用 CPU 卸载的同时最大化 GPU 使用率,您可以设置 `device_map = "auto"`。
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=gptq_config)
|
||||
```
|
||||
|
||||
请注意,不支持磁盘卸载。此外,如果由于数据集而内存不足,您可能需要在`from_pretrained`中设置`max_memory`。查看这个[指南](https://huggingface.co/docs/accelerate/usage_guides/big_modeling#designing-a-device-map)以了解有关`device_map`和`max_memory`的更多信息。
|
||||
|
||||
|
||||
<Tip warning={true}>
|
||||
目前,GPTQ量化仅适用于文本模型。此外,量化过程可能会花费很多时间,具体取决于硬件性能(175B模型在NVIDIA A100上需要4小时)。请在Hub上检查是否有模型的GPTQ量化版本。如果没有,您可以在GitHub上提交需求。
|
||||
</Tip>
|
||||
|
||||
### 推送量化模型到 🤗 Hub
|
||||
|
||||
您可以使用`push_to_hub`将量化模型像任何模型一样推送到Hub。量化配置将与模型一起保存和推送。
|
||||
|
||||
```python
|
||||
quantized_model.push_to_hub("opt-125m-gptq")
|
||||
tokenizer.push_to_hub("opt-125m-gptq")
|
||||
```
|
||||
|
||||
如果您想在本地计算机上保存量化模型,您也可以使用`save_pretrained`来完成:
|
||||
|
||||
```python
|
||||
quantized_model.save_pretrained("opt-125m-gptq")
|
||||
tokenizer.save_pretrained("opt-125m-gptq")
|
||||
```
|
||||
|
||||
请注意,如果您量化模型时想使用`device_map`,请确保在保存之前将整个模型移动到您的GPU或CPU之一。
|
||||
|
||||
```python
|
||||
quantized_model.to("cpu")
|
||||
quantized_model.save_pretrained("opt-125m-gptq")
|
||||
```
|
||||
|
||||
### 从 🤗 Hub 加载一个量化模型
|
||||
|
||||
您可以使用`from_pretrained`从Hub加载量化模型。
|
||||
请确保推送权重是量化的,检查模型配置对象中是否存在`quantization_config`属性。
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM
|
||||
model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq")
|
||||
```
|
||||
|
||||
如果您想更快地加载模型,并且不需要分配比实际需要内存更多的内存,量化模型也使用`device_map`参数。确保您已安装`accelerate`库。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM
|
||||
model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", device_map="auto")
|
||||
```
|
||||
|
||||
### Exllama内核加快推理速度
|
||||
|
||||
保留格式:对于 4 位模型,您可以使用 exllama 内核来提高推理速度。默认情况下,它处于启用状态。您可以通过在 [`GPTQConfig`] 中传递 `use_exllama` 来更改此配置。这将覆盖存储在配置中的量化配置。请注意,您只能覆盖与内核相关的属性。此外,如果您想使用 exllama 内核,整个模型需要全部部署在 gpus 上。此外,您可以使用 版本 > 0.4.2 的 Auto-GPTQ 并传递 `device_map` = "cpu" 来执行 CPU 推理。对于 CPU 推理,您必须在 `GPTQConfig` 中传递 `use_exllama = False`。
|
||||
|
||||
```py
|
||||
import torch
|
||||
gptq_config = GPTQConfig(bits=4)
|
||||
model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", device_map="auto", quantization_config=gptq_config)
|
||||
```
|
||||
|
||||
随着 exllamav2 内核的发布,与 exllama 内核相比,您可以获得更快的推理速度。您只需在 [`GPTQConfig`] 中传递 `exllama_config={"version": 2}`:
|
||||
|
||||
```py
|
||||
import torch
|
||||
gptq_config = GPTQConfig(bits=4, exllama_config={"version":2})
|
||||
model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", device_map="auto", quantization_config = gptq_config)
|
||||
```
|
||||
|
||||
请注意,目前仅支持 4 位模型。此外,如果您正在使用 peft 对量化模型进行微调,建议禁用 exllama 内核。
|
||||
|
||||
您可以在此找到这些内核的基准测试 [这里](https://github.com/huggingface/optimum/tree/main/tests/benchmark#gptq-benchmark)
|
||||
|
||||
|
||||
#### 微调一个量化模型
|
||||
|
||||
在Hugging Face生态系统的官方支持下,您可以使用GPTQ进行量化后的模型进行微调。
|
||||
请查看`peft`库了解更多详情。
|
||||
|
||||
### 示例演示
|
||||
|
||||
请查看 Google Colab [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94ilkUFu6ZX4ceb?usp=sharing),了解如何使用GPTQ量化您的模型以及如何使用peft微调量化模型。
|
||||
|
||||
### GPTQConfig
|
||||
|
||||
[[autodoc]] GPTQConfig
|
||||
|
||||
|
||||
## `bitsandbytes` 集成
|
||||
|
||||
🤗 Transformers 与 `bitsandbytes` 上最常用的模块紧密集成。您可以使用几行代码以 8 位精度加载您的模型。
|
||||
自bitsandbytes的0.37.0版本发布以来,大多数GPU硬件都支持这一点。
|
||||
|
||||
在[LLM.int8()](https://arxiv.org/abs/2208.07339)论文中了解更多关于量化方法的信息,或者在[博客文章](https://huggingface.co/blog/hf-bitsandbytes-integration)中了解关于合作的更多信息。
|
||||
|
||||
自其“0.39.0”版本发布以来,您可以使用FP4数据类型,通过4位量化加载任何支持“device_map”的模型。
|
||||
|
||||
如果您想量化自己的 pytorch 模型,请查看 🤗 Accelerate 的[文档](https://huggingface.co/docs/accelerate/main/en/usage_guides/quantization)。
|
||||
|
||||
以下是您可以使用“bitsandbytes”集成完成的事情
|
||||
|
||||
### 通用用法
|
||||
|
||||
只要您的模型支持使用 🤗 Accelerate 进行加载并包含 `torch.nn.Linear` 层,您可以在调用 [`~PreTrainedModel.from_pretrained`] 方法时使用 `load_in_8bit` 或 `load_in_4bit` 参数来量化模型。这也应该适用于任何模态。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM
|
||||
|
||||
model_8bit = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", load_in_8bit=True)
|
||||
model_4bit = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", load_in_4bit=True)
|
||||
```
|
||||
|
||||
默认情况下,所有其他模块(例如 `torch.nn.LayerNorm`)将被转换为 `torch.float16` 类型。但如果您想更改它们的 `dtype`,可以重载 `torch_dtype` 参数:
|
||||
|
||||
```python
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModelForCausalLM
|
||||
|
||||
>>> model_8bit = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", load_in_8bit=True, torch_dtype=torch.float32)
|
||||
>>> model_8bit.model.decoder.layers[-1].final_layer_norm.weight.dtype
|
||||
torch.float32
|
||||
```
|
||||
|
||||
|
||||
### FP4 量化
|
||||
|
||||
#### 要求
|
||||
|
||||
确保在运行以下代码段之前已完成以下要求:
|
||||
|
||||
- 最新版本 `bitsandbytes` 库
|
||||
`pip install bitsandbytes>=0.39.0`
|
||||
|
||||
- 安装最新版本 `accelerate`
|
||||
`pip install --upgrade accelerate`
|
||||
|
||||
- 安装最新版本 `transformers`
|
||||
`pip install --upgrade transformers`
|
||||
|
||||
#### 提示和最佳实践
|
||||
|
||||
|
||||
- **高级用法:** 请参考 [此 Google Colab notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf) 以获取 4 位量化高级用法和所有可选选项。
|
||||
|
||||
- **使用 `batch_size=1` 实现更快的推理:** 自 `bitsandbytes` 的 `0.40.0` 版本以来,设置 `batch_size=1`,您可以从快速推理中受益。请查看 [这些发布说明](https://github.com/TimDettmers/bitsandbytes/releases/tag/0.40.0) ,并确保使用大于 `0.40.0` 的版本以直接利用此功能。
|
||||
|
||||
- **训练:** 根据 [QLoRA 论文](https://arxiv.org/abs/2305.14314),对于4位基模型训练(使用 LoRA 适配器),应使用 `bnb_4bit_quant_type='nf4'`。
|
||||
|
||||
- **推理:** 对于推理,`bnb_4bit_quant_type` 对性能影响不大。但是为了与模型的权重保持一致,请确保使用相同的 `bnb_4bit_compute_dtype` 和 `torch_dtype` 参数。
|
||||
|
||||
#### 加载 4 位量化的大模型
|
||||
|
||||
在调用 `.from_pretrained` 方法时使用 `load_in_4bit=True`,可以将您的内存使用量减少到大约原来的 1/4。
|
||||
|
||||
```python
|
||||
# pip install transformers accelerate bitsandbytes
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "bigscience/bloom-1b7"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_4bit=True)
|
||||
```
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
需要注意的是,一旦模型以 4 位量化方式加载,就无法将量化后的权重推送到 Hub 上。此外,您不能训练 4 位量化权重,因为目前尚不支持此功能。但是,您可以使用 4 位量化模型来训练额外参数,这将在下一部分中介绍。
|
||||
|
||||
</Tip>
|
||||
|
||||
### 加载 8 位量化的大模型
|
||||
|
||||
您可以通过在调用 `.from_pretrained` 方法时使用 `load_in_8bit=True` 参数,将内存需求大致减半来加载模型
|
||||
|
||||
|
||||
```python
|
||||
# pip install transformers accelerate bitsandbytes
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "bigscience/bloom-1b7"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True)
|
||||
```
|
||||
|
||||
然后,像通常使用 `PreTrainedModel` 一样使用您的模型。
|
||||
|
||||
您可以使用 `get_memory_footprint` 方法检查模型的内存占用。
|
||||
|
||||
|
||||
```python
|
||||
print(model.get_memory_footprint())
|
||||
```
|
||||
|
||||
通过这种集成,我们能够在较小的设备上加载大模型并运行它们而没有任何问题。
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
需要注意的是,一旦模型以 8 位量化方式加载,除了使用最新的 `transformers` 和 `bitsandbytes` 之外,目前尚无法将量化后的权重推送到 Hub 上。此外,您不能训练 8 位量化权重,因为目前尚不支持此功能。但是,您可以使用 8 位量化模型来训练额外参数,这将在下一部分中介绍。
|
||||
|
||||
注意,`device_map` 是可选的,但设置 `device_map = 'auto'` 更适合用于推理,因为它将更有效地调度可用资源上的模型。
|
||||
|
||||
|
||||
</Tip>
|
||||
|
||||
#### 高级用例
|
||||
|
||||
在这里,我们将介绍使用 FP4 量化的一些高级用例。
|
||||
|
||||
##### 更改计算数据类型
|
||||
|
||||
计算数据类型用于改变在进行计算时使用的数据类型。例如,hidden states可以是 `float32`,但为了加速,计算时可以被设置为 `bf16`。默认情况下,计算数据类型被设置为 `float32`。
|
||||
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import BitsAndBytesConfig
|
||||
|
||||
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
|
||||
```
|
||||
|
||||
#### 使用 NF4(普通浮点数 4)数据类型
|
||||
|
||||
您还可以使用 NF4 数据类型,这是一种针对使用正态分布初始化的权重而适应的新型 4 位数据类型。要运行:
|
||||
|
||||
```python
|
||||
from transformers import BitsAndBytesConfig
|
||||
|
||||
nf4_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_quant_type="nf4",
|
||||
)
|
||||
|
||||
model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)
|
||||
```
|
||||
|
||||
#### 使用嵌套量化进行更高效的内存推理
|
||||
|
||||
我们还建议用户使用嵌套量化技术。从我们的经验观察来看,这种方法在不增加额外性能的情况下节省更多内存。这使得 llama-13b 模型能够在具有 1024 个序列长度、1 个批次大小和 4 个梯度累积步骤的 NVIDIA-T4 16GB 上进行 fine-tuning。
|
||||
|
||||
```python
|
||||
from transformers import BitsAndBytesConfig
|
||||
|
||||
double_quant_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_use_double_quant=True,
|
||||
)
|
||||
|
||||
model_double_quant = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=double_quant_config)
|
||||
```
|
||||
|
||||
### 将量化模型推送到🤗 Hub
|
||||
|
||||
您可以使用 `push_to_hub` 方法将量化模型推送到 Hub 上。这将首先推送量化配置文件,然后推送量化模型权重。
|
||||
请确保使用 `bitsandbytes>0.37.2`(在撰写本文时,我们使用的是 `bitsandbytes==0.38.0.post1`)才能使用此功能。
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", device_map="auto", load_in_8bit=True)
|
||||
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
|
||||
|
||||
model.push_to_hub("bloom-560m-8bit")
|
||||
```
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
对大模型,强烈鼓励将 8 位量化模型推送到 Hub 上,以便让社区能够从内存占用减少和加载中受益,例如在 Google Colab 上加载大模型。
|
||||
|
||||
</Tip>
|
||||
|
||||
### 从🤗 Hub加载量化模型
|
||||
|
||||
您可以使用 `from_pretrained` 方法从 Hub 加载量化模型。请确保推送的权重是量化的,检查模型配置对象中是否存在 `quantization_config` 属性。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("{your_username}/bloom-560m-8bit", device_map="auto")
|
||||
```
|
||||
|
||||
请注意,在这种情况下,您不需要指定 `load_in_8bit=True` 参数,但需要确保 `bitsandbytes` 和 `accelerate` 已安装。
|
||||
情注意,`device_map` 是可选的,但设置 `device_map = 'auto'` 更适合用于推理,因为它将更有效地调度可用资源上的模型。
|
||||
|
||||
### 高级用例
|
||||
|
||||
本节面向希望探索除了加载和运行 8 位模型之外还能做什么的进阶用户。
|
||||
|
||||
#### 在 `cpu` 和 `gpu` 之间卸载
|
||||
|
||||
此高级用例之一是能够加载模型并将权重分派到 `CPU` 和 `GPU` 之间。请注意,将在 CPU 上分派的权重 **不会** 转换为 8 位,因此会保留为 `float32`。此功能适用于想要适应非常大的模型并将模型分派到 GPU 和 CPU 之间的用户。
|
||||
|
||||
首先,从 `transformers` 中加载一个 [`BitsAndBytesConfig`],并将属性 `llm_int8_enable_fp32_cpu_offload` 设置为 `True`:
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
||||
|
||||
quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
|
||||
```
|
||||
|
||||
假设您想加载 `bigscience/bloom-1b7` 模型,您的 GPU显存仅足够容纳除了`lm_head`外的整个模型。因此,您可以按照以下方式编写自定义的 device_map:
|
||||
|
||||
```python
|
||||
device_map = {
|
||||
"transformer.word_embeddings": 0,
|
||||
"transformer.word_embeddings_layernorm": 0,
|
||||
"lm_head": "cpu",
|
||||
"transformer.h": 0,
|
||||
"transformer.ln_f": 0,
|
||||
}
|
||||
```
|
||||
|
||||
然后如下加载模型:
|
||||
|
||||
```python
|
||||
model_8bit = AutoModelForCausalLM.from_pretrained(
|
||||
"bigscience/bloom-1b7",
|
||||
device_map=device_map,
|
||||
quantization_config=quantization_config,
|
||||
)
|
||||
```
|
||||
|
||||
这就是全部内容!享受您的模型吧!
|
||||
|
||||
#### 使用`llm_int8_threshold`
|
||||
|
||||
您可以使用 `llm_int8_threshold` 参数来更改异常值的阈值。“异常值”是一个大于特定阈值的`hidden state`值。
|
||||
这对应于`LLM.int8()`论文中描述的异常检测的异常阈值。任何高于此阈值的`hidden state`值都将被视为异常值,对这些值的操作将在 fp16 中完成。值通常是正态分布的,也就是说,大多数值在 [-3.5, 3.5] 范围内,但有一些额外的系统异常值,对于大模型来说,它们的分布非常不同。这些异常值通常在区间 [-60, -6] 或 [6, 60] 内。Int8 量化对于幅度为 ~5 的值效果很好,但超出这个范围,性能就会明显下降。一个好的默认阈值是 6,但对于更不稳定的模型(小模型、微调)可能需要更低的阈值。
|
||||
这个参数会影响模型的推理速度。我们建议尝试这个参数,以找到最适合您的用例的参数。
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
||||
|
||||
model_id = "bigscience/bloom-1b7"
|
||||
|
||||
quantization_config = BitsAndBytesConfig(
|
||||
llm_int8_threshold=10,
|
||||
)
|
||||
|
||||
model_8bit = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
device_map=device_map,
|
||||
quantization_config=quantization_config,
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
```
|
||||
|
||||
#### 跳过某些模块的转换
|
||||
|
||||
一些模型有几个需要保持未转换状态以确保稳定性的模块。例如,Jukebox 模型有几个 `lm_head` 模块需要跳过。使用 `llm_int8_skip_modules` 参数进行相应操作。
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
||||
|
||||
model_id = "bigscience/bloom-1b7"
|
||||
|
||||
quantization_config = BitsAndBytesConfig(
|
||||
llm_int8_skip_modules=["lm_head"],
|
||||
)
|
||||
|
||||
model_8bit = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
device_map=device_map,
|
||||
quantization_config=quantization_config,
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
```
|
||||
|
||||
#### 微调已加载为8位精度的模型
|
||||
|
||||
借助Hugging Face生态系统中适配器(adapters)的官方支持,您可以在8位精度下微调模型。这使得可以在单个Google Colab中微调大模型,例如`flan-t5-large`或`facebook/opt-6.7b`。请查看[`peft`](https://github.com/huggingface/peft)库了解更多详情。
|
||||
|
||||
注意,加载模型进行训练时无需传递`device_map`。它将自动将您的模型加载到GPU上。如果需要,您可以将设备映射为特定设备(例如`cuda:0`、`0`、`torch.device('cuda:0')`)。请注意,`device_map=auto`仅应用于推理。
|
||||
|
||||
|
||||
### BitsAndBytesConfig
|
||||
|
||||
[[autodoc]] BitsAndBytesConfig
|
||||
|
||||
|
||||
## 使用 🤗 `optimum` 进行量化
|
||||
|
||||
请查看[Optimum 文档](https://huggingface.co/docs/optimum/index)以了解更多关于`optimum`支持的量化方法,并查看这些方法是否适用于您的用例。
|
||||
|
58
docs/source/zh/main_classes/text_generation.md
Normal file
58
docs/source/zh/main_classes/text_generation.md
Normal file
@ -0,0 +1,58 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Generation
|
||||
|
||||
每个框架都在它们各自的 `GenerationMixin` 类中实现了文本生成的 `generate` 方法:
|
||||
|
||||
- PyTorch [`~generation.GenerationMixin.generate`] 在 [`~generation.GenerationMixin`] 中实现。
|
||||
- TensorFlow [`~generation.TFGenerationMixin.generate`] 在 [`~generation.TFGenerationMixin`] 中实现。
|
||||
- Flax/JAX [`~generation.FlaxGenerationMixin.generate`] 在 [`~generation.FlaxGenerationMixin`] 中实现。
|
||||
|
||||
无论您选择哪个框架,都可以使用 [`~generation.GenerationConfig`] 类实例对 generate 方法进行参数化。有关生成方法的控制参数的完整列表,请参阅此类。
|
||||
|
||||
要了解如何检查模型的生成配置、默认值是什么、如何临时更改参数以及如何创建和保存自定义生成配置,请参阅 [文本生成策略指南](../generation_strategies)。该指南还解释了如何使用相关功能,如token流。
|
||||
|
||||
## GenerationConfig
|
||||
|
||||
[[autodoc]] generation.GenerationConfig
|
||||
- from_pretrained
|
||||
- from_model_config
|
||||
- save_pretrained
|
||||
|
||||
## GenerationMixin
|
||||
|
||||
[[autodoc]] generation.GenerationMixin
|
||||
- generate
|
||||
- compute_transition_scores
|
||||
- greedy_search
|
||||
- sample
|
||||
- beam_search
|
||||
- beam_sample
|
||||
- contrastive_search
|
||||
- group_beam_search
|
||||
- constrained_beam_search
|
||||
|
||||
## TFGenerationMixin
|
||||
|
||||
[[autodoc]] generation.TFGenerationMixin
|
||||
- generate
|
||||
- compute_transition_scores
|
||||
|
||||
## FlaxGenerationMixin
|
||||
|
||||
[[autodoc]] generation.FlaxGenerationMixin
|
||||
- generate
|
65
docs/source/zh/main_classes/tokenizer.md
Normal file
65
docs/source/zh/main_classes/tokenizer.md
Normal file
@ -0,0 +1,65 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Tokenizer
|
||||
|
||||
tokenizer负责准备输入以供模型使用。该库包含所有模型的tokenizer。大多数tokenizer都有两种版本:一个是完全的 Python 实现,另一个是基于 Rust 库 [🤗 Tokenizers](https://github.com/huggingface/tokenizers) 的“Fast”实现。"Fast" 实现允许:
|
||||
|
||||
1. 在批量分词时显著提速
|
||||
2. 在原始字符串(字符和单词)和token空间之间进行映射的其他方法(例如,获取包含给定字符的token的索引或与给定token对应的字符范围)。
|
||||
|
||||
基类 [PreTrainedTokenizer] 和 [PreTrained TokenizerFast] 实现了在模型输入中编码字符串输入的常用方法(见下文),并从本地文件或目录或从库提供的预训练的 tokenizer(从 HuggingFace 的 AWS S3 存储库下载)实例化/保存 python 和“Fast” tokenizer。它们都依赖于包含常用方法的 [`~tokenization_utils_base.PreTrainedTokenizerBase`]和[`~tokenization_utils_base.SpecialTokensMixin`]。
|
||||
|
||||
因此,[`PreTrainedTokenizer`] 和 [`PreTrainedTokenizerFast`] 实现了使用所有tokenizers的主要方法:
|
||||
|
||||
- 分词(将字符串拆分为子词标记字符串),将tokens字符串转换为id并转换回来,以及编码/解码(即标记化并转换为整数)。
|
||||
- 以独立于底层结构(BPE、SentencePiece……)的方式向词汇表中添加新tokens。
|
||||
- 管理特殊tokens(如mask、句首等):添加它们,将它们分配给tokenizer中的属性以便于访问,并确保它们在标记过程中不会被分割。
|
||||
|
||||
[`BatchEncoding`] 包含 [`~tokenization_utils_base.PreTrainedTokenizerBase`] 的编码方法(`__call__`、`encode_plus` 和 `batch_encode_plus`)的输出,并且是从 Python 字典派生的。当tokenizer是纯 Python tokenizer时,此类的行为就像标准的 Python 字典一样,并保存这些方法计算的各种模型输入(`input_ids`、`attention_mask` 等)。当分词器是“Fast”分词器时(即由 HuggingFace 的 [tokenizers 库](https://github.com/huggingface/tokenizers) 支持),此类还提供了几种高级对齐方法,可用于在原始字符串(字符和单词)与token空间之间进行映射(例如,获取包含给定字符的token的索引或与给定token对应的字符范围)。
|
||||
|
||||
|
||||
## PreTrainedTokenizer
|
||||
|
||||
[[autodoc]] PreTrainedTokenizer
|
||||
- __call__
|
||||
- add_tokens
|
||||
- add_special_tokens
|
||||
- apply_chat_template
|
||||
- batch_decode
|
||||
- decode
|
||||
- encode
|
||||
- push_to_hub
|
||||
- all
|
||||
|
||||
## PreTrainedTokenizerFast
|
||||
|
||||
[`PreTrainedTokenizerFast`] 依赖于 [tokenizers](https://huggingface.co/docs/tokenizers) 库。可以非常简单地将从 🤗 tokenizers 库获取的tokenizers加载到 🤗 transformers 中。查看 [使用 🤗 tokenizers 的分词器](../fast_tokenizers) 页面以了解如何执行此操作。
|
||||
|
||||
[[autodoc]] PreTrainedTokenizerFast
|
||||
- __call__
|
||||
- add_tokens
|
||||
- add_special_tokens
|
||||
- apply_chat_template
|
||||
- batch_decode
|
||||
- decode
|
||||
- encode
|
||||
- push_to_hub
|
||||
- all
|
||||
|
||||
## BatchEncoding
|
||||
|
||||
[[autodoc]] BatchEncoding
|
Loading…
Reference in New Issue
Block a user