您的位置:首页 > 其它

how to C

2016-03-23 23:04 344 查看

How to C in 2016

This is a draft I wrote in early 2015 and never got around to publishing. Here's the mostly unpolished version because it wasn't doing anybody any good sitting in my drafts folder. The simplest change was updating year 2015 to 2016 at publication time.

Feel free to submit fixes/improvements/complaints as necessary. -Matt

Adrián Arroyo Calle provides a Spanish translation at

¿Cómo programar en C (en 2016)?

Japanese POSTD provides a Japanese translation at
2016年、C言語はどう書くべきか (前編) and
2016年、C言語はどう書くべきか (後編).

Chinese InfoQ provides a Chinese translation at
C语言的2016.

Programmer Magazine provides a Chinese translation at
2016年,C语言该怎样写 (as

PDF too).

Keith Thompson provides a nice set of corrections and alternative opinions at

howto-c-response.

Rob Graham provides a response covering other avenues out of scope here at

Some notes C in 2016.

Now on to the article...

The first rule of C is don't write C if you can avoid it.

If you must write in C, you should follow modern rules.

C has been around since the
early 1970s. People have "learned C" at various points during its evolution, but knowledge usually get stuck after learning, so everybody has a different set of things they believe about C based on the year(s) they first started learning.

It's important to not remain stuck in your "things I learned in the 80s/90s" mindset of C development.

This page assumes you are on a modern platform conforming to modern standards and you have no excessive legacy compatibility requirements. We shouldn't be globally tied to ancient standards just because some companies refuse to upgrade 20 year old systems.

Preflight

Standard c99 (c99 means "C Standard from 1999"; c11 means "C Standard from 2011", so 11 > 99).

clang, default

clang uses an extended version of C11 by default (
GNU C11 mode
), so no extra options are needed for modern features.
If you want standard C11, you need to specify
-std=c11
; if you want standard C99, use
-std=c99
.
clang compiles your source files faster than gcc

gcc requires you specify
-std=c99
or
-std=c11


gcc builds source files slower than clang, but sometimes generates faster code. Performance comparisons and regression testings are important.
gcc-5 defaults to
GNU C11 mode
(same as clang), but if you need exactly c11 or c99, you should still specify
-std=c11
or
-std=c99
.

Optimizations

-O2, -O3

generally you want
-O2
, but sometimes you want
-O3
. Test under both levels (and across compilers) then keep the best performing binaries.

-Os

-Os
helps if your concern is cache efficiency (which it should be)

Warnings

-Wall -Wextra -pedantic


newer compiler versions have
-Wpedantic
, but they still accept the ancient
-pedantic
as well for wider backwards compatibility.

during testing you should add
-Werror
and
-Wshadow
on all your platforms

it can be tricky deploying production source using
-Werror
because different platforms and compilers and libraries can emit different warnings. You probably don't want to kill a user's entire build just because their version of GCC on a platform
you've never seen complains in new and wonderous ways.

extra fancy options include
-Wstrict-overflow -fno-strict-aliasing


Either specify
-fno-strict-aliasing
or be sure to only access objects as the type they have at creation. Since so much existing C code aliases across types, using
-fno-strict-aliasing
is a much safer bet if you don't control the entire underlying source tree.

as of now, Clang reports some valid syntax as a warning, so you should add
-Wno-missing-field-initializers


GCC fixed this unnecessary warning after GCC 4.7.0

Building

Compilation units

The most common way of building C projects is to decompose every source file into an object file then link all the objects together at the end. This procedure works great for incremental development, but it is suboptimal for performance and optimization.
Your compiler can't detect potential optimizations across file boundaries this way.

LTO — Link Time Optimization

LTO fixes the "source analysis and optimization across compilation units problem" by annotating object files with intermediate representation so source-aware optimizations can be carried out across compilation units at link time.
LTO can slow down the linking process noticeably, but
make -j
helps if your build includes multiple non-interdependent final targets (.a, .so, .dylib, testing executables, application executables, etc).
clang LTO (guide)
gcc LTO
As of 2016, clang and gcc releases support LTO by just adding
-flto
to your command line options during object compilation and final library/program linking.
LTO
still needs some babysitting though. Sometimes, if your program has code not used directly but used by additional libraries, LTO can evict functions or code because it detects, globally when linking, some code is unused/unreachable and
doesn't need to be included in the final linked result.

Arch

-march=native


give the compiler permission to use your CPU's full feature set
again, performance testing and regression testing is important (then comparing the results across multiple compilers and/or compiler versions) is important to make sure any enabled optimizations don't have adverse side effects.

-msse2
and
-msse4.2
may be useful if you need to target not-your-build-machine features.

Writing code

Types

If you find yourself typing
char
or
int
or
short
or
long
or
unsigned
into new code, you're doing it wrong.

For modern programs, you should
#include <stdint.h>
then use
standard types.

For more details, see the
stdint.h specification.

The common standard types are:

int8_t
,
int16_t
,
int32_t
,
int64_t
— signed integers
uint8_t
,
uint16_t
,
uint32_t
,
uint64_t
— unsigned integers
float
— standard 32-bit floating point
double
- standard 64-bit floating point

Notice we don't have
char
anymore.
char
is actually misnamed and misused in C.

Developers routinely abuse
char
to mean "byte" even when they are doing unsigned byte manipulations. It's much cleaner to use
uint8_t
to mean single a unsigned-byte/octet-value and
uint8_t *
to mean sequence-of-unsigned-byte/octet-values.

Special Standard Types

In addition to standard fixed-width like
uint16_t
and
int32_t
, we also have
fast and least types defined in the
stdint.h specification.

Fast types are:

int_fast8_t
,
int_fast16_t
,
int_fast32_t
,
int_fast64_t
— signed integers
uint_fast8_t
,
uint_fast16_t
,
uint_fast32_t
,
uint_fast64_t
— unsigned integers

Fast types provide a minimum of
X
bits, but there is no guarantee the underlying storage size is exactly what you request. If a larger type has better support on your target platform, a
fast type will automatically use the better supported larger type.

The best example here is, on some 64-bit systems, when you request
uint_fast16_t
you actually get a
uint64_t
because operating on word-sized integers will be faster than operating on half of a 32-bit integer.

The fast guidelines aren't followed on every system though. One standout is OS X, where
fast types are
defined exactly as their corresponding fixed width counterparts.

Fast types can be useful for self-documenting code as well. If you know your counters only need 16 bits, but you prefer your math use 64 bit integers because they are faster on your platform, that's where
uint_fast16_t
would help. Under 64-bit Linux platforms,
uint_fast16_t
gives you a fast 64-bit counter while maintaining the code-level inline documentation of "we only need 16 bits here."

One thing to be aware of for fast types: it can impact certain test cases. If you need to test for storage width edge cases, having
uint_fast16_t
be 16 bits on some platforms (OS X) and 64 bits on other platforms (Linux) can increase the minimum number of platforms where your tests need to pass.

Fast types do introduce the same uncertainty as
int
not being a standard size across platforms, but with
fast types, you can limit your uncertainty to known-safe locations in your code (counters, temporary values with checked bounds, etc).

Least types are:

int_least8_t
,
int_least16_t
,
int_least32_t
,
int_least64_t
— signed integers
uint_least8_t
,
uint_least16_t
,
uint_least32_t
,
uint_least64_t
— unsigned integers

Least types provide you with the most compact number of bits for the type you request.

The least guidelines, in practice, mean least types are just defined to standard fixed width types, since standard fixed width types already provide the exact minimum number of bits you request.

to
int
or not to
int

Some readers have pointed out they truly love
int
and you'll have to pry it from their cold dead fingers. I'd like to point out is is technically
impossible to program correctly if the sizes of your types change out from under you.

Also see
RATIONALE included with
inttypes.h for reasons why using non-fixed-width types is unsafe. If you are truly smart enough to conceptualize
int
being 16 bits on some platforms and 32 bits on other platforms throughout your development while also testing all 16 bit and 32 bit edge cases for every place you use
int
, please feel free to use
int
.

For the rest of us who can't hold entire multi-level decision tree platform specification hierarchies in our heads while writing fizzbuzz, we can use fixed width types and automatically have more correct code with much less conceptual hassle and much less
required testing overhead.

Or, said more concisely in the specification: "the ISO C standard integer promotion rule can produce silent changes unexpectedly."

Good luck with that.

One Exception to never-
char

The only acceptable use of
char
in 2016 is if a pre-existing API requires
char
(e.g.
strncat
, printf'ing "%s", ...) or if you're initializing a read-only string (e.g.
const char *hello = "hello";
) because the C type of string literals (
"hello"
) is
char []
.

ALSO: In C11 we have native unicode support, and the type of UTF-8 string literals is still
char []
even for multibyte sequences like const char *abcgrr = u8"abc
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: