Data types in C:
Main article: C variable
types and declarations
This section needs
additional citations for verification. Please help improve this article by
adding citations to reliable sources. Unsourced material may be challenged and
removed. (October 2012) (Learn how and when to remove this template message)
The type system in C is
static and weakly typed, which makes it similar to the type system of ALGOL
descendants such as Pascal.[28] There are built-in types for integers of
various sizes, both signed and unsigned, floating-point numbers, and enumerated
types (enum). Integer type char is often used for single-byte characters. C99
added a boolean datatype. There are also derived types including arrays,
pointers, records (struct), and untagged unions (union).
C is often used in
low-level systems programming where escapes from the type system may be
necessary. The compiler attempts to ensure type correctness of most
expressions, but the programmer can override the checks in various ways, either
by using a type cast to explicitly convert a value from one type to another, or
by using pointers or unions to reinterpret the underlying bits of a data object
in some other way.
Some find C's
declaration syntax unintuitive, particularly for function pointers. (Ritchie's
idea was to declare identifiers in contexts resembling their use:
"declaration reflects use".)[29]
C's usual arithmetic
conversions allow for efficient code to be generated, but can sometimes produce
unexpected results. For example, a comparison of signed and unsigned integers
of equal width requires a conversion of the signed value to unsigned. This can
generate unexpected results if the signed value is negative.
Pointers:
C supports
the use of pointers, a type of reference that records the address or location
of an object or function in memory. Pointers can be dereferenced to access data
stored at the address pointed to, or to invoke a pointed-to function. Pointers
can be manipulated using assignment or pointer arithmetic. The run-time
representation of a pointer value is typically a raw memory address (perhaps
augmented by an offset-within-word field), but since a pointer's type includes
the type of the thing pointed to, expressions including pointers can be
type-checked at compile time. Pointer arithmetic is automatically scaled by the
size of the pointed-to data type. Pointers are used for many purposes in C.
Text strings are commonly manipulated using pointers into arrays of characters.
Dynamic memory allocation is performed using pointers. Many data types, such as
trees, are commonly implemented as dynamically allocated struct objects linked
together using pointers. Pointers to functions are useful for passing functions
as arguments to higher-order functions (such as qsort or bsearch) or as
callbacks to be invoked by event handlers.
A null
pointer value explicitly points to no valid location. Dereferencing a null
pointer value is undefined, often resulting in a segmentation fault. Null
pointer values are useful for indicating special cases such as no
"next" pointer in the final node of a linked list, or as an error
indication from functions returning pointers. In appropriate contexts in source
code, such as for assigning to a pointer variable, a null pointer constant can
be written as 0, with or without explicit casting to a pointer type, or as the
NULL macro defined by several standard headers. In conditional contexts, null
pointer values evaluate to false, while all other pointer values evaluate to
true.
Void pointers
(void *) point to objects of unspecified type, and can therefore be used as
"generic" data pointers. Since the size and type of the pointed-to
object is not known, void pointers cannot be dereferenced, nor is pointer
arithmetic on them allowed, although they can easily be (and in many contexts
implicitly are) converted to and from any other object pointer type.[27]
Careless use
of pointers is potentially dangerous. Because they are typically unchecked, a
pointer variable can be made to point to any arbitrary location, which can
cause undesirable effects. Although properly used pointers point to safe
places, they can be made to point to unsafe places by using invalid pointer
arithmetic; the objects they point to may continue to be used after
deallocation (dangling pointers); they may be used without having been
initialized (wild pointers); or they may be directly assigned an unsafe value
using a cast, union, or through another corrupt pointer. In general, C is
permissive in allowing manipulation of and conversion between pointer types,
although compilers typically provide options for various levels of checking.
Some other programming languages address these problems by using more
restrictive reference types.
Arrays:
See also: C string
Array types in C are traditionally of a fixed, static size
specified at compile time. (The more recent C99 standard also allows a form of
variable-length arrays.) However, it is also possible to allocate a block of
memory (of arbitrary size) at run-time, using the standard library's malloc
function, and treat it as an array. C's unification of arrays and pointers
means that declared arrays and these dynamically allocated simulated arrays are
virtually interchangeable.
Since arrays are always accessed (in effect) via pointers,
array accesses are typically not checked against the underlying array size,
although some compilers may provide bounds checking as an option.[30] Array
bounds violations are therefore possible and rather common in carelessly
written code, and can lead to various repercussions, including illegal memory
accesses, corruption of data, buffer overruns, and run-time exceptions. If
bounds checking is desired, it must be done manually.
C does not have a special provision for declaring multi-dimensional
arrays, but rather relies on recursion within the type system to declare arrays
of arrays, which effectively accomplishes the same thing. The index values of
the resulting "multi-dimensional array" can be thought of as
increasing in row-major order.
Multi-dimensional arrays are commonly used in numerical
algorithms (mainly from applied linear algebra) to store matrices. The
structure of the C array is well suited to this particular task. However, since
arrays are passed merely as pointers, the bounds of the array must be known
fixed values or else explicitly passed to any subroutine that requires them,
and dynamically sized arrays of arrays cannot be accessed using double
indexing. (A workaround for this is to allocate the array with an additional
"row vector" of pointers to the columns.)
C99 introduced "variable-length arrays" which
address some, but not all, of the issues with ordinary C arrays.
Array–pointer interchangeability:
The subscript notation x[i] (where x designates a pointer) is syntactic sugar for *(x+i).[31] Taking advantage of the compiler's knowledge of the pointer type, the address that x + i points to is not the base address (pointed to by x) incremented by i bytes, but rather is defined to be the base address incremented by i multiplied by the size of an element that x points to. Thus, x[i] designates the i+1th element of the array.
The subscript notation x[i] (where x designates a pointer) is syntactic sugar for *(x+i).[31] Taking advantage of the compiler's knowledge of the pointer type, the address that x + i points to is not the base address (pointed to by x) incremented by i bytes, but rather is defined to be the base address incremented by i multiplied by the size of an element that x points to. Thus, x[i] designates the i+1th element of the array.
Furthermore, in most expression contexts (a notable
exception is as operand of sizeof), the name of an array is automatically
converted to a pointer to the array's first element. This implies that an array
is never copied as a whole when named as an argument to a function, but rather
only the address of its first element is passed. Therefore, although function
calls in C use pass-by-value semantics, arrays are in effect passed by
reference.
The size of an element can be determined by applying the
operator sizeof to any dereferenced element of x, as in n = sizeof *x or n =
sizeof x[0], and the number of elements in a declared array A can be determined
as sizeof A / sizeof A[0]. The latter only applies to array names: variables
declared with subscripts (int A[20]). Due to the semantics of C, it is not
possible to determine the entire size of arrays through pointers to arrays or
those created by dynamic allocation (malloc); code such as sizeof arr / sizeof
arr[0] (where arr designates a pointer) will not work since the compiler
assumes the size of the pointer itself is being requested.[32][33] Since array
name arguments to sizeof are not converted to pointers, they do not exhibit
such ambiguity. However, arrays created by dynamic allocation are accessed by
pointers rather than true array variables, so they suffer from the same sizeof
issues as array pointers.
Thus, despite this apparent equivalence between array and
pointer variables, there is still a distinction to be made between them. Even
though the name of an array is, in most expression contexts, converted into a
pointer (to its first element), this pointer does not itself occupy any
storage; the array name is not an l-value, and its address is a constant,
unlike a pointer variable. Consequently, what an array "points to"
cannot be changed, and it is impossible to assign a new address to an array
name. Array contents may be copied, however, by using the memcpy function, or
by accessing the individual elements.
No comments:
Post a Comment