《智能系统与技术丛书·AI安全之对抗样本入门》 5.7.2 使用TensorFlow实现CW 智能系统与技术丛书·AI安全之对抗样本入门全部章节在线阅读-海棠书屋

亲,双击屏幕即可自动滚动

5.7.2 使用TensorFlow实现CW

    5.7.2 使用tensorflow实现cw
    下面介绍在tensorflow平台实现cw算法的基本过程，示例代码为：
    https://github.com/duoergun0729/adversarial_examples/blob/master/code/
    5-cw-tensorflow-pb.ipynb
    该代码参考了以下网址论文中的实现。
    https://github.com/carlini/nn_robust_attacks/blob/master/l2_attack.py
    在示例中主要流程为，用原始图像的值初始化对抗样本，通过前向函数和adam优化器根据cw算法迭代更新对抗样本，直到满足最大迭代次数或者对抗样本预测值等于攻击目标的标签为止，需要指出的是cw算法不会显示计算出梯度，梯度的计算和使用隐藏在了adam优化器中，如图5-30所示。
    图5-30 使用tensorflow实现cw示例（定向攻击）
    下面我们介绍使用tensorflow实现cw定向攻击的核心代码，首先定义全局参数包括adam优化器的最大迭代次数、学习率、二分查找最大次数、c的初始值、像素值的边界和定向攻击目标的标签，需要指出两点，第一是定向攻击目标的标签需要使用独热编码；第二是虽然imagenet2012的分类对象种类为1000，但是tensorflow下的inception模型的输出大小是1008不是1000。
    #adam的最大迭代次数
    max_iterations=1000
    #adam的学习速率
    learning_rate=0.01
    #二分查找最大次数
    binary_search_steps=10
    #c的初始值
    initial_const=1e-3
    confidence=initial_const
    #像素值区间
    boxmin = 0.0
    boxmax = 255.0
    #类别数 tf的实现里面是1008
    num_labels=1008
    #攻击目标标签必须使用独热编码
    target_label=288
    target_label=np.eye(num_labels)[target_label]
    定义四个重要的tf变量，其中modifier表示扰动量，整个优化过程中迭代调整的就是modifier；timg表示原始数据，tlab表示独热编码后的攻击目标标签，const表示c值，这三个变量在全局初始化时会被设置为0，但是每轮迭代前会被设置为对应的值。
    #我们需要优化调整的变量
    modifier = tf.variable(np.zeros(shape,dtype=np.float32))
    #用于给tf输入参数的变量
    timg = tf.variable(np.zeros(shape), dtype=tf.float32)
    tlab = tf.variable(np.zeros((num_labels)), dtype=tf.float32)
    const = tf.variable(np.zeros([]), dtype=tf.float32)
    根据像素值的边界计算标准差boxmul和均值boxplus，把对抗样本modifier+timg转换成新的输入newimg。
    boxmul = (boxmax - boxmin) / 2.
    boxplus = (boxmin + boxmax) / 2.
    newimg = tf.tanh(modifier + timg) * boxmul + boxplus
    从tensorflow官网下载inception模型文件并解压到指定路径，该模型基于imagenet2012数据集训练。
    http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
    从pb文件中还原计算图和对应的参数，并且把newimg映射到计算图的输入。
    session=tf.session()
    def create_graph(dirname):
    with tf.gfile.fastgfile(dirname, 'rb') as f:
    graph_def = session.graph_def
    graph_def.parsefromstring(f.read())
    _ = tf.import_graph_def(graph_def, name='',
    input_map={"expanddims:0":newimg})
    create_graph("models/classify_image_graph_def.pb")
    # 初始化参数非常重要
    session.run(tf.global_variables_initializer())
    tensorlist=[n.name for n in session.graph_def.node]
    #注意一定要查看下当前tensor的名称再写
    softmax_tensor = session.graph.get_tensor_by_name('softmax:0')
    logits_tensor=session.graph.get_tensor_by_name('softmax/logits:0')
    output=logits_tensor
    定义目标函数loss，其中loss1表示识别为定向攻击目标的概率与识别其他分类的概率的差，loss2表示扰动的大小。
    #计算对抗样本和原始数据之间的距离
    l2dist = tf.reduce_sum(tf.square(newimg-(tf.tanh(timg) * boxmul + boxplus)),[1,2,3])
    #挑选指定分类标签和剩下其他分类中概率最大者，计算两者之间的概率差
    real = tf.reduce_sum((tlab)*output,1)
    other = tf.reduce_max((1-tlab)*output - (tlab*10000),1)
    loss1 = tf.maximum(0.0, other-real+k)
    #计算总的损失函数
    loss2 = tf.reduce_sum(l2dist)
    loss1 = tf.reduce_sum(const*loss1)
    loss = loss1+loss2
    定义优化器，cw算法使用的优化器是adam，主要原因是adam收敛速度快。
    optimizer = tf.train.adamoptimizer(learning_rate)
    train_op = optimizer.minimize(loss, var_list=[modifier])
    把原始图像转换成tanh的形式，定义二分查找的边界lower_bound和upper_bound并初始化c。o_bestl2记录迭代过程中最小的距离，o_bestscore记录标签，o_bestattack记录对抗样本。
    # 转换成tanh的形式
    image = np.arctanh((image - boxplus) / boxmul * 0.999999)
    #c的初始化边界
    lower_bound = 0
    c=initial_const
    upper_bound = 1e10
    # 保存最佳的l2值、预测概率值和对抗样本
    o_bestl2 = 1e10
    o_bestscore = -1
    o_bestattack = [np.zeros(shape)]
    迭代进行二分查找，每轮二分查找前都需要初始化全部全局参数，主要是为了重置adam的内置参数的状态。
    for outer_step in range(binary_search_steps):
    print("o_bestl2={} confidence={}".format(o_bestl2,confidence) )
    # 重置adam的内置参数的状态
    session.run(tf.global_variables_initializer())
    session.run(tf.assign(timg, image))
    session.run(tf.assign(tlab, target_label))
    session.run(tf.assign(const, confidence))
    使用adam迭代优化，论文中建议的迭代优化的次数为10000，如果对抗样本的预测值与定向攻击目标的标签一致，表明定向攻击成功，更新o_bestl2、o_bestscore和o_bestattack。与其他定向攻击算法不同的是，cw致力于迭代求解最优的对抗样本。在本例中最优的对抗样本的定义为对抗样本的预测值与定向攻击目标的标签一致，同时其对应的l2值最小。因此cw并不会因为某轮迭代时，对抗样本的预测值与定向攻击目标的标签一致就退出，而是继续迭代计算，求解对抗样本的预测值与定向攻击目标的标签一致的情况下，c的最小值。
    for iteration in range(max_iterations):
    # 进行攻击
    _, l, l2, sc, nimg = session.run([train_op, loss, l2dist, output, newimg])
    # 迭代过程每完成10% 打印当时的损失函数的值
    if iteration%(max_iterations//10) == 0:
    print(iteration,session.run((loss,loss1,loss2)))
    if (l2 < o_bestl2) and (np.argmax(sc) == np.argmax(target_label) ):
    print("attack success l2={}
    target_label={}".format(l2,np.argmax(target_label)))
    o_bestl2 = l2
    o_bestscore = np.argmax(sc)
    o_bestattack = nimg
    每轮二分查找结束后，更新c值以及对应的边界。
    confidence_old=-1
    if (o_bestscore == np.argmax(target_label)) and o_bestscore != -1:
    #攻击成功，减小c值
    upper_bound = min(upper_bound,confidence)
    if upper_bound < 1e9:
    print()
    confidence_old=confidence
    confidence = (lower_bound + upper_bound)/2
    else:
    lower_bound = max(lower_bound,confidence)
    confidence_old=confidence
    if upper_bound < 1e9:
    confidence = (lower_bound + upper_bound)/2
    else:
    confidence *= 10
    如图5-31所示，经过10轮二分查找，每轮adam优化10000次，攻击成功，c值为2125.0，l0为15828即只修改了15828个像素，l2为31138.6，即对抗样本与原始图像之间的差别。
    l0=15828 l2=31138.633736887045
    图5-31 原始数据和对抗样本的对比示意图（adam迭代10000次）