Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

OutputTrainingArguments(output_dir=tmp_trainer, overwrite_output_dir=False, do_train=False, do_eval=None, do_predict=False,evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False,per_device_train_batch_size=8, per_device_eval_batch_size=8,gradient_accumulation_steps=1, eval_accumulation_steps=None,learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9,adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0,num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Apr21_20-33-20_MONSTER, logging_strategy=IntervalStrategy.STEPS,logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None,no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None,tpu_metrics_debug=False, debug=False, dataloader_drop_last=False,eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=tmp_trainer, disable_tqdm=False, remove_unused_columns=True,label_names=None, load_best_model_at_end=False,metric_for_best_model=None, greater_is_better=None,ignore_data_skip=False, sharded_ddp=[], deepspeed=None,label_smoothing_factor=0.0, adafactor=False, group_by_length=False,length_column_name=length, report_to=['tensorboard'],ddp_find_unused_parameters=None, dataloader_pin_memory=True,skip_memory_metrics=False, _n_gpu=1, mp_parameters=)The Trainer creates an instance of TrainingArguments by itself, and the valuesabove are the arguments' default values. There is the learning_rate=5e-05, andthe num_train_epochs=3.0, and many, many others. The optimizer used, eventhough it’s not listed above, is the AdamW, a variation of Adam.We can create an instance of TrainingArguments ourselves to get at least a bit ofcontrol over the training process. The only required argument is the output_dir, butwe’ll specify some other arguments as well:Fine-Tuning with HuggingFace | 995

Training Arguments1 from transformers import TrainingArguments2 training_args = TrainingArguments(3 output_dir='output',4 num_train_epochs=1,5 per_device_train_batch_size=1,6 per_device_eval_batch_size=8,7 evaluation_strategy='steps',8 eval_steps=300,9 logging_steps=300,10 gradient_accumulation_steps=8,11 )"Batch size ONE?! You gotta be kidding me!"Well, I would, if it were not for the gradient_accumulation_steps argument. That’show we can make the mini-batch size larger even if we’re using a low-end GPUthat is capable of handling only one data point at a time.The Trainer can accumulate the gradients computed at every training step (whichis taking only one data point), and, after eight steps, it uses the accumulatedgradients to update the parameters. For all intents and purposes, it is as if themini-batch had size eight. Awesome, right?Moreover, let’s set the logging_steps to three hundred, so it prints the traininglosses every three hundred mini-batches (and it counts the mini-batches as havingsize eight due to the gradient accumulation)."What about validation losses?"The evaluation_strategy argument allows you to run an evaluation after everyeval_steps steps (if set to steps like in the example above) or after every epoch (ifset to epoch)."Can I get it to print accuracy or other metrics too?"Sure, you can! But, first, you need to define a function that takes an instance ofEvalPrediction (returned by the internal validation loop), computes the desiredmetrics, and returns a dictionary:996 | Chapter 11: Down the Yellow Brick Rabbit Hole

Output

TrainingArguments(output_dir=tmp_trainer, overwrite_output_dir=

False, do_train=False, do_eval=None, do_predict=False,

evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False,

per_device_train_batch_size=8, per_device_eval_batch_size=8,

gradient_accumulation_steps=1, eval_accumulation_steps=None,

learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9,

adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0,

num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType

.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Apr21_20

-33-20_MONSTER, logging_strategy=IntervalStrategy.STEPS,

logging_first_step=False, logging_steps=500, save_strategy

=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None,

no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend

=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None,

tpu_metrics_debug=False, debug=False, dataloader_drop_last=False,

eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name

=tmp_trainer, disable_tqdm=False, remove_unused_columns=True,

label_names=None, load_best_model_at_end=False,

metric_for_best_model=None, greater_is_better=None,

ignore_data_skip=False, sharded_ddp=[], deepspeed=None,

label_smoothing_factor=0.0, adafactor=False, group_by_length=False,

length_column_name=length, report_to=['tensorboard'],

ddp_find_unused_parameters=None, dataloader_pin_memory=True,

skip_memory_metrics=False, _n_gpu=1, mp_parameters=)

The Trainer creates an instance of TrainingArguments by itself, and the values

above are the arguments' default values. There is the learning_rate=5e-05, and

the num_train_epochs=3.0, and many, many others. The optimizer used, even

though it’s not listed above, is the AdamW, a variation of Adam.

We can create an instance of TrainingArguments ourselves to get at least a bit of

control over the training process. The only required argument is the output_dir, but

we’ll specify some other arguments as well:

Fine-Tuning with HuggingFace | 995

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!