您的位置:首页 > 运维架构 > Linux

Linux的O_DIRECT选项

2015-09-03 19:03 633 查看
在man 2 open的时候发现一个O_DIRECT选项,使用O_DIRECT选项后,可以不使用缓存直接写入。在海量数据写入的时候,不使用缓存貌似更快呢!于是也尝试写了一个用O_DIRECT选项的文件写入。完成O_DIRECT选项写入的代码还真不容易,使用new或者malloc分配的内存是无法在O_DIRECT选项下工作的,必须使用posix_memalign(或valloc, memalign,这两个函数已经被标记为废弃)。

我的猜想(唉,没学过内核,只能是猜想)是:要使用高性能的调用,则在用户空间分配的内存的格式必须与内核空间所使用的内存页一致,这样内核才能高速地处理数据。posix_memalign是基于页的方式来分配内存的,且分配的大小必须是页大小的整数倍。

以下是实现了O_DIRECT选项的文件写入代码:

//写入200MB数据花费的时间是3100994微秒,3.1秒,更慢了,而且慢太多!!!

/*

测试大数据写入的性能 test_io_4.cpp 使用posix函数,且使用O_DIRECT选项

*/

#include <stdio.h>

#include <time.h>

#include <sys/time.h>

#include <sys/types.h>

#include <sys/stat.h>

#include <fcntl.h>

#include <unistd.h>

#include <stdlib.h>

static int operator-(struct timeval& lsh, struct timeval& rsh)

{

if (lsh.tv_sec==rsh.tv_sec)

{

return lsh.tv_usec - rsh.tv_usec;

}

else

{

return (lsh.tv_sec-rsh.tv_sec)*1000000 + (lsh.tv_usec - rsh.tv_usec);

}

}

void test()

{

struct timeval start;

struct timeval end;

const int DATA_LEN = 1024*1024*200; //200MB

char* pData = NULL;

printf("page size=%d\n", getpagesize());

int nTemp = posix_memalign((void**)&pData, getpagesize(), DATA_LEN);

if (0!=nTemp)

{

perror("posix_memalign error");

return;

}

//pData[DATA_LEN-1] = '\0';

gettimeofday(&start, NULL);

int fd = open("write_direct.dat", O_RDWR | O_CREAT | O_DIRECT);

if (fd<0)

{

perror("open error:");

return;

}

int nLen = write(fd, pData, DATA_LEN);

if (nLen<DATA_LEN)

{

perror("write error:");

return;

}

close(fd);

fd = -1;

gettimeofday(&end, NULL);

free(pData);

pData = NULL;

//显示占用时间

struct tm stTime;

localtime_r(&start.tv_sec, &stTime);

char strTemp[40];

strftime(strTemp, sizeof(strTemp)-1, "%Y-%m-%d %H:%M:%S", &stTime);

printf("start=%s.%07d\n", strTemp, start.tv_usec);

//

localtime_r(&end.tv_sec, &stTime);

strftime(strTemp, sizeof(strTemp)-1, "%Y-%m-%d %H:%M:%S", &stTime);

printf("end =%s.%07d\n", strTemp, end.tv_usec);

printf("spend=%d 微秒\n", end-start);

}

int main()

{

test();

return 1;

}

使用了O_DIRECT选项反而是文件写入更慢了,百思不得其解,终于在网上找到这篇文章:

《Linus对O_DIRECT非常不感冒啊,呵呵》
http://www.linux-ren.org/modules/newbb/viewtopic.php?topic_id=2722
-------------------------------------------
A thread on the
lkml began with a query about using O_DIRECT when opening a file. An early

white paper written by Andrea Arcangeli [interview] to describe the O_DIRECT patch before it was merged into the 2.4 kernel
explains, "with O_DIRECT the kernel will do DMA directly from/to the physical memory pointed [to] by the userspace buffer passed as [a] parameter to the read/write syscalls. So there will be no CPU and memory bandwidth spent in the copies between userspace
memory and kernel cache, and there will be no CPU time spent in kernel in the management of the cache (like cache lookups, per-page locks etc..)." Linux creator Linus Torvalds was quick to reply that despite all the claims there is no good reason for mounting
files with O_DIRECT, suggesting that interfaces like
madvise() and
posix_fadvise() should be used instead.
From: Aubrey [email blocked]

To: "Hua Zhong" [email blocked]O_

Subject: O_DIRECT question

Date: Thu, 11 Jan 2007 10:57:06 +0800

Hi all,

Opening file with O_DIRECT flag can do the un-buffered read/write access.

So if I need un-buffered access, I have to change all of my

applications to add this flag. What's more, Some scripts like "cp

oldfile newfile" still use pagecache and buffer.

Now, my question is, is there a existing way to mount a filesystem

with O_DIRECT flag? so that I don't need to change anything in my

system. If there is no option so far, What is the right way to achieve

my purpose?

Thanks a lot.

-Aubrey

From: Linus Torvalds [email blocked]

Subject: Re: O_DIRECT question

Date: Wed, 10 Jan 2007 19:05:30 -0800 (PST)

On Thu, 11 Jan 2007, Aubrey wrote:

>

> Now, my question is, is there a existing way to mount a filesystem

> with O_DIRECT flag? so that I don't need to change anything in my

> system. If there is no option so far, What is the right way to achieve

> my purpose?

The right way to do it is to just not use O_DIRECT.

The whole notion of "direct IO" is totally braindamaged. Just say no.

This is your brain: O

This is your brain on O_DIRECT: .

Any questions?

I should have fought back harder. There really is no valid reason for EVER

using O_DIRECT. You need a buffer whatever IO you do, and it might as well

be the page cache. There are better ways to control the page cache than

play games and think that a page cache isn't necessary.

So don't use O_DIRECT. Use things like madvise() and posix_fadvise()

instead.

Linus

From: Linus Torvalds [email blocked]

Subject: Re: O_DIRECT question

Date: Wed, 10 Jan 2007 19:15:48 -0800 (PST)

On Wed, 10 Jan 2007, Linus Torvalds wrote:

>

> So don't use O_DIRECT. Use things like madvise() and posix_fadvise()

> instead.

Side note: the only reason O_DIRECT exists is because database people are

too used to it, because other OS's haven't had enough taste to tell them

to do it right, so they've historically hacked their OS to get out of the

way.

As a result, our madvise and/or posix_fadvise interfaces may not be all

that strong, because people sadly don't use them that much. It's a sad

example of a totally broken interface (O_DIRECT) resulting in better

interfaces not getting used, and then not getting as much development

effort put into them.

So O_DIRECT not only is a total disaster from a design standpoint (just

look at all the crap it results in), it also indirectly has hurt better

interfaces. For example, POSIX_FADV_NOREUSE (which _could_ be a useful and

clean interface to make sure we don't pollute memory unnecessarily with

cached pages after they are all done) ends up being a no-op ;/

Sad. And it's one of those self-fulfilling prophecies. Still, I hope some

day we can just rip the damn disaster out.

Linus

From:
Andrew Morton [email blocked]

Subject: Re: O_DIRECT question

Date: Wed, 10 Jan 2007 20:51:57 -0800

On Thu, 11 Jan 2007 10:57:06 +0800

Aubrey [email blocked] wrote:

> Hi all,

>

> Opening file with O_DIRECT flag can do the un-buffered read/write access.

> So if I need un-buffered access, I have to change all of my

> applications to add this flag. What's more, Some scripts like "cp

> oldfile newfile" still use pagecache and buffer.

> Now, my question is, is there a existing way to mount a filesystem

> with O_DIRECT flag? so that I don't need to change anything in my

> system. If there is no option so far, What is the right way to achieve

> my purpose?

Not possible, basically.

O_DIRECT reads and writes must be aligned to the device's block size

(usually 512 bytes) in memory addresses, file offsets and read/write request

sizes. Very few applications will bother to do that and will hence fail if

their files are automagically opened with O_DIRECT.

----------------------------------------------------------------------------------

呵呵,注意这一句:The whole notion of "direct IO" is totally braindamaged. (使用"direct IO"的想法简直是脑子坏掉了!) 恩,看来O_DIRECT选项是早就不推荐使用的了。
NE:

使用O_DIRECT的话,就必须以页为单位进行I/O,这是没办法的事,因为设备本身就是块设备。你可以加一层中间代码,自己计算对齐后的文件偏移量,用posix_memalign生成对齐的buffer,进行I/O以后,再把buffer里面的内容copy到调用者的buffer里面去。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: