您的位置：首页 > 其它

tensorflow 批量读取csv文件用于做深度学习算法相关

2017-08-06 19:15 633 查看

目前用了tensorflow、deeplearning4j两个深度学习框架，dl相关算法对数据格式要求都是批量的喂进去，deepl4j在前面已经有几个例子说明，tensorflow也可以批量读取数据，不断给dl算法喂数据进去，在网上刚刚看到一个例子，http://www.cnblogs.com/hunttown/p/6844477.html ，首先数据格式如下，鸾尾花数据

做机器学习的人应该都知道：

Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species

21,5.4,3.4,1.7,0.2,Iris-setosa

22,5.1,3.7,1.5,0.4,Iris-setosa

23,4.6,3.6,1.0,0.2,Iris-setosa

24,5.1,3.3,1.7,0.5,Iris-setosa

25,4.8,3.4,1.9,0.2,Iris-setosa

26,5.0,3.0,1.6,0.2,Iris-setosa

27,5.0,3.4,1.6,0.4,Iris-setosa

28,5.2,3.5,1.5,0.2,Iris-setosa

29,5.2,3.4,1.4,0.2,Iris-setosa

30,4.7,3.2,1.6,0.2,Iris-setosa

31,4.8,3.1,1.6,0.2,Iris-setosa

32,5.4,3.4,1.5,0.4,Iris-setosa

33,5.2,4.1,1.5,0.1,Iris-setosa

34,5.5,4.2,1.4,0.2,Iris-setosa

35,4.9,3.1,1.5,0.1,Iris-setosa

36,5.0,3.2,1.2,0.2,Iris-setosa

37,5.5,3.5,1.3,0.2,Iris-setosa

39,5.5,4.2,1.4,0.2,Iris-virginica

40,4.9,3.1,1.5,0.1,Iris-versicolor

38,5.0,3.2,1.2,0.2,Iris-versicolor

51,5.5,3.5,1.3,0.2,Iris-versicolor

下面是程序实现：

import  tensorflow  as tf

path="/Users/shuubiasahi/Desktop/业务相关文档/iris.csv"

def  read_data(file_queue):

    reader=tf.TextLineReader(skip_header_lines=1)

    key,value=reader.read(file_queue)

    defaults=[[0], [0.], [0.], [0.], [0.], ['']]

    Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species = tf.decode_csv(value, defaults)

    preprocess_op=tf.case({

        tf.equal(Species,tf.constant('Iris-setosa')):lambda :tf.constant(0),

        tf.equal(Species, tf.constant('Iris-versicolor')): lambda: tf.constant(1),

        tf.equal(Species, tf.constant('Iris-virginica')): lambda: tf.constant(2),

    },lambda :tf.constant(-1),exclusive=True)

    return tf.stack([SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm]), preprocess_op

def  create_pipeline(filename,batch_size,num_epochs=None):

    file_queue = tf.train.string_input_producer([filename], num_epochs=num_epochs)

    example, label = read_data(file_queue)

    min_after_dequeue = 1000

    capacity = min_after_dequeue + batch_size

    example_batch, label_batch = tf.train.shuffle_batch(

        [example, label], batch_size=batch_size, capacity=capacity,

        min_after_dequeue=min_after_dequeue

    return example_batch, label_batch

x_train_batch, y_train_batch = create_pipeline(path, 5, num_epochs=1000)

x_test, y_test = create_pipeline(path, 60)

init_op = tf.global_variables_initializer()

local_init_op = tf.local_variables_initializer()  # local variables like epoch_num, batch_size

with tf.Session() as sess:

    sess.run(init_op)

    sess.run(local_init_op)

    # Start populating the filename queue.

    coord = tf.train.Coordinator()

    threads = tf.train.start_queue_runners(coord=coord)

    # Retrieve a single instance:

    try:

        #while not coord.should_stop():

        for _ in range(6):

            example, label = sess.run([x_test, y_test])

            print (example)

            print (label)

    except tf.errors.OutOfRangeError:

        print ('Done reading')

    finally:

        coord.request_stop()

    coord.join(threads)

    sess.close()

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航