Search

huggingface-pytorch-v100x4

train steps
450
batch-size
32
samples/sec
train-time(sec)
259.6039
x faster
Empty
python run_mlm.py\ --model_name_or_path roberta-base\ --dataset_name wikitext\ --dataset_config_name wikitext-2-raw-v1\ --do_train\ --do_eval\ --output_dir ../test-result --per_device_train_batch_size=8
Shell
[INFO|trainer.py:837] 2021-02-20 18:44:59,537 >> ***** Running training ***** [INFO|trainer.py:838] 2021-02-20 18:44:59,537 >> Num examples = 4798 [INFO|trainer.py:839] 2021-02-20 18:44:59,537 >> Num Epochs = 3 [INFO|trainer.py:840] 2021-02-20 18:44:59,537 >> Instantaneous batch size per device = 8 [INFO|trainer.py:841] 2021-02-20 18:44:59,537 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:842] 2021-02-20 18:44:59,537 >> Gradient Accumulation steps = 1 [INFO|trainer.py:843] 2021-02-20 18:44:59,537 >> Total optimization steps = 450 {'train_runtime': 259.6039, 'train_samples_per_second': 1.733, 'epoch': 3.0 [INFO|trainer.py:1600] 2021-02-20 18:49:20,501 >> ***** Running Evaluation ***** [INFO|trainer.py:1601] 2021-02-20 18:49:20,501 >> Num examples = 496 [INFO|trainer.py:1602] 2021-02-20 18:49:20,501 >> Batch size = 32 02/20/2021 18:49:24 - INFO - __main__ - ***** Eval results ***** 02/20/2021 18:49:24 - INFO - __main__ - perplexity = 3.526545416102001
Python