您的位置：首页 > 数据库 > MySQL

关于批量插入数据之我见（100万级别的数据，mysql）

2016-12-22 21:25 477 查看

转自：http://blog.csdn.net/frinder/article/details/38830723

因前段时间去面试，问到如何高效向数据库插入10万条记录，之前没处理过类似问题，也没看过相关资料，结果没答上来，今天就查了些资料，总结出三种方法：
测试数据库为MySQL!!!
方法一:

[java] view
plain copy

public static void insert() {

        // 开时时间

        Long begin = new Date().getTime();

        // sql前缀

        String prefix = "INSERT INTO tb_big_data (count, create_time, random) VALUES ";

        try {

            // 保存sql后缀

            StringBuffer suffix = new StringBuffer();

            // 设置事务为非自动提交

            conn.setAutoCommit(false);

            // Statement st = conn.createStatement();

            // 比起st，pst会更好些

            PreparedStatement pst = conn.prepareStatement("");

            // 外层循环，总提交事务次数

            for (int i = 1; i <= 100; i++) {

                // 第次提交步长

                for (int j = 1; j <= 10000; j++) {

                    // 构建sql后缀

                    suffix.append("(" + j * i + ", SYSDATE(), " + i * j

                            * Math.random() + "),");

                }

                // 构建完整sql

                String sql = prefix + suffix.substring(0, suffix.length() - 1);

                // 添加执行sql

                pst.addBatch(sql);

                // 执行操作

                pst.executeBatch();

                // 提交事务

                conn.commit();

                // 清空上一次添加的数据

                suffix = new StringBuffer();

            }

            // 头等连接

            pst.close();

            conn.close();

        } catch (SQLException e) {

            e.printStackTrace();

        }

        // 结束时间

        Long end = new Date().getTime();

        // 耗时

        System.out.println("cast : " + (end - begin) / 1000 + " ms");

    }

输出时间：cast : 23 ms
该方法目前测试是效率最高的方法!

方法二：

[java] view
plain copy

public static void insertRelease() {

        Long begin = new Date().getTime();

        String sql = "INSERT INTO tb_big_data (count, create_time, random) VALUES (?, SYSDATE(), ?)";

        try {

            conn.setAutoCommit(false);

            PreparedStatement pst = conn.prepareStatement(sql);

            for (int i = 1; i <= 100; i++) {

                for (int k = 1; k <= 10000; k++) {

                    pst.setLong(1, k * i);

                    pst.setLong(2, k * i);

                    pst.addBatch();

                }

                pst.executeBatch();

                conn.commit();

            }

            pst.close();

            conn.close();

        } catch (SQLException e) {

            e.printStackTrace();

        }

        Long end = new Date().getTime();

        System.out.println("cast : " + (end - begin) / 1000 + " ms");

    }

注：注释就没有了，和上面类同，下面会有分析！
控制台输出：cast : 111 ms
执行时间是上面方法的5倍！

方法三：

[java] view
plain copy

public static void insertBigData(SpringBatchHandler sbh) {

        Long begin = new Date().getTime();

        JdbcTemplate jdbcTemplate = sbh.getJdbcTemplate();

        final int count = 10000;

        String sql = "INSERT INTO tb_big_data (count, create_time, random) VALUES (?, SYSDATE(), ?)";

        jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {

            // 为prepared statement设置参数。这个方法将在整个过程中被调用的次数

            public void setValues(PreparedStatement pst, int i)

                    throws SQLException {

                pst.setLong(1, i);

                pst.setInt(2, i);

            }



            // 返回更新的结果集条数

            public int getBatchSize() {

                return count;

            }

        });

        Long end = new Date().getTime();

        System.out.println("cast : " + (end - begin) / 1000 + " ms");

    }

该方法采用的是spring batchUpdate执行，因效率问题，数据量只有1万条！

执行时间：cast : 387 ms

总结：方法一和方法二很类同，唯一不同的是方法一采用的是“insert into tb (...) values(...),(...)...;”的方式执行插入操作，
方法二则是“insert into tb (...) values (...);insert into tb (...) values (...);...”的方式，要不是测试，我也不知道两者差别是如此之大！
当然，这个只是目前的测试，具体执行时间和步长也有很大关系！如过把步长改为100，可能方法就要几分钟了吧，这个可以自己测试哈。。。
方法三网上很推崇，不过，效率大家也都看到了，1万条记录，耗时6分钟，可见其效率并不理想！而且方法三需要配置spring applicationContext环境才能应用！
不过，方法三在ssh/spring-mvc中可用性还是很高的！

刚才开始研究大数据方面的问题，以上也只是真实测试的结果，并不一定就是事实，有好的建议，大家请指正，谢谢！
相互学习，才能进步更快！

晚点会把源码发上来，大家可以直接去下载测试！

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航