说在前面的话
本文大部分借鉴参考文章(文末已给出参考链接),仅作为学习笔记
首先提示 Google colab 使用 GPU 有时间限制,尽量不需要使用时就不要选择使用
步骤1-6只需 CPU 即可完成,步骤7-9才需要使用 GPU
1 连接云盘,指定路径,只使用CPU
cd + 路径
的命令时灵时不灵,下次还是使用 os.chdir
命令执行
import os
os.chdir("/content/drive/My Drive")
!ls
2 下载 faster-rcnn 项目文件
!git clone -b pytorch-1.0 https://github.com/jwyang/faster-rcnn.pytorch.git
运行结果
Cloning into 'faster-rcnn.pytorch'...
remote: Enumerating objects: 3858, done.
remote: Total 3858 (delta 0), reused 0 (delta 0), pack-reused 3858
Receiving objects: 100% (3858/3858), 6.22 MiB | 8.10 MiB/s, done.
Resolving deltas: 100% (2615/2615), done.
3 指定到项目路径下
!cd faster-rcnn.pytorch/
!ls
4 创建新文件夹data并下载预训练模型
此时的路径应该是在 faster-rcnn.pytorch 目录下
!mkdir data
os.chdir('data')
!mkdir pretrained_model
os.chdir('pretrained_model')
# 下载预训练模型res101
!wget https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/resnet101_caffe.pth
# 下载预训练模型vgg16
!wget https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/vgg16_caffe.pth
运行结果
--2020-11-02 16:47:59-- https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/resnet101_caffe.pth
Resolving filebox.ece.vt.edu (filebox.ece.vt.edu)... 128.173.88.43
Connecting to filebox.ece.vt.edu (filebox.ece.vt.edu)|128.173.88.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 178678116 (170M)
Saving to: ‘resnet101_caffe.pth’
resnet101_caffe.pth 100%[===================>] 170.40M 41.1MB/s in 8.2s
2020-11-02 16:48:07 (20.8 MB/s) - ‘resnet101_caffe.pth’ saved [178678116/178678116]
--2020-11-02 16:48:08-- https://filebox.ece.vt.edu/~jw2yang/faster-rcnn/pretrained-base-models/vgg16_caffe.pth
Resolving filebox.ece.vt.edu (filebox.ece.vt.edu)... 128.173.88.43
Connecting to filebox.ece.vt.edu (filebox.ece.vt.edu)|128.173.88.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 553433685 (528M)
Saving to: ‘vgg16_caffe.pth’
vgg16_caffe.pth 100%[===================>] 527.79M 32.3MB/s in 27s
2020-11-02 16:48:35 (19.3 MB/s) - ‘vgg16_caffe.pth’ saved [553433685/553433685]
5 下载训练集voc2007到data文件下
此时的路径应该是在 faster-rcnn.pytorch/data 目录下
os.chdir('../') #返回上一级目录即data/下
# 下载数据集
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
# 解压缩
!tar xvf VOCtrainval_06-Nov-2007.tar
!tar xvf VOCtest_06-Nov-2007.tar
!tar xvf VOCdevkit_08-Jun-2007.tar
# 建立软连接
!ln -s $VOCdevkit VOCdevkit2007 #注意!如果上面解压缩得到的文件夹名字为"VOCdevdit",要将其改为“VOCdevdit2007",否则后面会报错。
报错:ln: failed to create symbolic link ‘./VOCdevkit2007’: Operation not supported
分析:路径原因,进入到 data 目录下,再次建立软连接
!ls
# 建立软连接
!ln -s $VOCdevkit VOCdevkit2007 #再来一次
如果显示 File exists ,说明已经软连接已经创建成功
pretrained_model VOCdevkit2007 VOCtrainval_06-Nov-2007.tar
VOCdevkit_08-Jun-2007.tar VOCtest_06-Nov-2007.tar
ln: failed to create symbolic link './VOCdevkit2007': File exists
6 回到data路径,然后进入lib中进行编译
os.chdir('../lib')
!python setup.py build develop
#编译成功的显示
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/content/drive/My Drive/faster-rcnn.pytorch/lib/model/csrc/vision.o build/temp.linux-x86_64-3.6/content/drive/My Drive/faster-rcnn.pytorch/lib/model/csrc/cpu/ROIAlign_cpu.o build/temp.linux-x86_64-3.6/content/drive/My Drive/faster-rcnn.pytorch/lib/model/csrc/cpu/nms_cpu.o -L/usr/local/lib/python3.6/dist-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.6/model/_C.cpython-36m-x86_64-linux-gnu.so
running develop
running egg_info
creating faster_rcnn.egg-info
writing faster_rcnn.egg-info/PKG-INFO
writing dependency_links to faster_rcnn.egg-info/dependency_links.txt
writing top-level names to faster_rcnn.egg-info/top_level.txt
writing manifest file 'faster_rcnn.egg-info/SOURCES.txt'
writing manifest file 'faster_rcnn.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.6/model/_C.cpython-36m-x86_64-linux-gnu.so -> model
Creating /usr/local/lib/python3.6/dist-packages/faster-rcnn.egg-link (link to .)
Adding faster-rcnn 0.1 to easy-install.pth file
Installed /content/drive/My Drive/faster-rcnn.pytorch/lib
Processing dependencies for faster-rcnn==0.1
Finished processing dependencies for faster-rcnn==0.1
7 开始使用GPU,重新指定路径
import os
os.chdir("/content/drive/My Drive/faster-rcnn.pytorch/")
!ls
8 再次编译setup.py文件
os.chdir('lib/')
!python setup.py build develop
os.chdir('../')
9 更改 trainval_net.py 文件
195行:cfg.TRAIN.USE_FLIPPED = False #把True 改成 False
目的:不对图片进行翻折,节省数据加载的时间
9 开始训练(加载数据集大概花费10-20分钟)
#切换GPU运行
!CUDA_VISIBLE_DEVICES=0 python3 trainval_net.py \
--dataset pascal_voc \
--net res101 \
--bs 4 \
--nw 0 \
--lr 0.004 \
--lr_decay_step 8 \
--epochs 10 \
--cuda \
Called with args:
Namespace(batch_size=4, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='pascal_voc', disp_interval=100, large_scale=False, lr=0.004, lr_decay_gamma=0.1, lr_decay_step=8, mGPUs=False, max_epochs=10, net='res101', num_workers=0, optimizer='sgd', resume=False, save_dir='models', session=1, start_epoch=1, use_tfboard=False)
Using config:
{ 'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [8, 16, 32],
'CROP_RESIZE_WITH_MAX_POOL': False,
'CUDA': False,
'DATA_DIR': '/content/drive/My Drive/faster-rcnn.pytorch/data',
'DEDUP_BOXES': 0.0625,
'EPS': 1e-14,
'EXP_DIR': 'res101',
'FEAT_STRIDE': [16],
'GPU_ID': 0,
'MATLAB': 'matlab',
'MAX_NUM_GT_BOXES': 20,
'MOBILENET': { 'DEPTH_MULTIPLIER': 1.0,
'FIXED_LAYERS': 5,
'REGU_DEPTH': False,
'WEIGHT_DECAY': 4e-05},
'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]),
'POOLING_MODE': 'align',
'POOLING_SIZE': 7,
'RESNET': { 'FIXED_BLOCKS': 1, 'MAX_POOL': False},
'RNG_SEED': 3,
'ROOT_DIR': '/content/drive/My Drive/faster-rcnn.pytorch',
'TEST': { 'BBOX_REG': True,
'HAS_RPN': True,
'MAX_SIZE': 1000,
'MODE': 'nms',
'NMS': 0.3,
'PROPOSAL_METHOD': 'gt',
'RPN_MIN_SIZE': 16,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'RPN_TOP_N': 5000,
'SCALES': [600],
'SVM': False},
'TRAIN': { 'ASPECT_GROUPING': False,
'BATCH_SIZE': 128,
'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_NORMALIZE_TARGETS': True,
'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
'BBOX_REG': True,
'BBOX_THRESH': 0.5,
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'BIAS_DECAY': False,
'BN_TRAIN': False,
'DISPLAY': 20,
'DOUBLE_BIAS': False,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'GAMMA': 0.1,
'HAS_RPN': True,
'IMS_PER_BATCH': 1,
'LEARNING_RATE': 0.001,
'MAX_SIZE': 1000,
'MOMENTUM': 0.9,
'PROPOSAL_METHOD': 'gt',
'RPN_BATCHSIZE': 256,
'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 8,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALES': [600],
'SNAPSHOT_ITERS': 5000,
'SNAPSHOT_KEPT': 3,
'SNAPSHOT_PREFIX': 'res101_faster_rcnn',
'STEPSIZE': [30000],
'SUMMARY_INTERVAL': 180,
'TRIM_HEIGHT': 600,
'TRIM_WIDTH': 600,
'TRUNCATED': False,
'USE_ALL_GT': True,
'USE_FLIPPED': True,
'USE_GT': False,
'WEIGHT_DECAY': 0.0001},
'USE_GPU_NMS': True}
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Preparing training data...
voc_2007_trainval gt roidb loaded from /content/drive/My Drive/faster-rcnn.pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl
done
before filtering, there are 5011 images...
after filtering, there are 5011 images...
5011 roidb entries
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
/content/drive/My Drive/faster-rcnn.pytorch/lib/roi_data_layer/roibatchLoader.py:191: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
keep = torch.nonzero(not_keep == 0).view(-1)
[session 1][epoch 1][iter 0/1252] loss: 5.0488, lr: 4.00e-03
fg/bg=(91/421), time cost: 0.546721
rpn_cls: 0.7514, rpn_box: 0.4337, rcnn_cls: 3.4795, rcnn_box 0.3841
[session 1][epoch 1][iter 100/1252] loss: 1.6237, lr: 4.00e-03
fg/bg=(100/412), time cost: 78.455308
rpn_cls: 0.0927, rpn_box: 0.0339, rcnn_cls: 0.5683, rcnn_box 0.4468
[session 1][epoch 1][iter 200/1252] loss: 1.2805, lr: 4.00e-03
fg/bg=(105/407), time cost: 80.196287
rpn_cls: 0.1298, rpn_box: 0.0758, rcnn_cls: 0.5040, rcnn_box 0.4562
[session 1][epoch 1][iter 300/1252] loss: 1.1924, lr: 4.00e-03
fg/bg=(128/384), time cost: 80.389846
rpn_cls: 0.2270, rpn_box: 0.0917, rcnn_cls: 0.5527, rcnn_box 0.5752
[session 1][epoch 1][iter 400/1252] loss: 1.1088, lr: 4.00e-03
fg/bg=(82/430), time cost: 80.646110
rpn_cls: 0.0800, rpn_box: 0.0659, rcnn_cls: 0.3532, rcnn_box 0.3635
参考文章
LCCFlccf 2019-04-14 21:50:57 使用colab训练faster-rcnn