Pages

Wednesday, March 31, 2010

Side effects

Even panacea can have side-effects in the sense that it makes humans too much dependent on it and subsequently lazy...even C is not untouched by it.
The best way to explain it can be done using examples. Consider the snippet below:

/*
int a=5;
printf("%d %d %d %d",a++,a++,++a,a++,++a);
*/

A superhit snippet and one of the most common queries posted on forums.
The confusion comes when the innocent looking snippet above gives different confusing outputs under different environments.
And most of the responses form a two word sentence like Undefined Behaviour, Sequence Points leaving the confused person even more confused and ...(undefined verb!!!)

But the explaination of the above innocent code will have to be done by covering few important topics. Lets take them down!

1> Undefined Behaviour
   A behaviour which is not defined is Undefined Behaviour. The most common example comes in the           following form:
  
/*
int a;
a++;  // a will have garbage
*/

The above action leaves compiler free to assign any value to 'a'.Similar instances are division by zero and indexing out of array.

2> Unspecified Behaviour
The behaviour arises when a code behaves differently under different compilers. Like the size of an integer is 2 bytes under Turbo C/C++ and 4 bytes under GCC compiler.
Its like, you behaving differently in front of your parents and in front of your friends ;)

3> Side effects
We now come to the order of evaluation in C. C language does not specify the order in which the operands of an operator are evaluated, with few exceptions like &&, ||, ?: and comma(,).
for example:

x=f() + g();

f() may be evaluated before g() or vice-versa. And if either f() or g() alters a variable on which the other depends, x can depend on order of evaluation.

But, if we write something like:
x=f(),g();
then the order of evaluation is defined [ f() is called first, then g()] because comma operator is an exception as mentioned earlier.

The best ambuigity of order of evaluation can be shown by the statement:

a[i]=i++;
The subscript can be old value of i or the new one.


Now, we come back to our good old snippet.

int a=5;
printf("%d %d %d %d",a++,a++,++a,a++,++a);

Following reasons can be given for the outcome of the above snippet:
* The comma operator in a function call does not guarantee order of evaluation, so we cant say which of the a's will be evaluated first.                                                                          
* This will leave compiler free to evaluate any 'a' and in any order. This should explain the confusing outputs that might appear.
* The Undefined Behaviour thus encountered gives rise to Unspecified Behaviour where the same snippet gives different outputs under different compilers and environments.

To summarise,
Compilers interpret these things in different ways, and generate different answers depending on their interpretation. The standard intentionally leaves most such matters unspecified. The writing of code that depends on order of evaluation is a bad programming practice, and, should be avoided.

Have fun with C!

Sunday, February 14, 2010

Of Standards & Compilers

Pre/Post/Extra Marital affairs between  Standards and Compilers

The first thing we do when we start learning a new programming language, is to try and execute programs.
C is no different and for that, we need a compiler.

By popularity, we pick up a latest book, like Let us C ,and use the Turbo C compiler.

We might come across a code snippet somewhere, try to run it in Turbo C compiler, and get error messages.
We change the compiler, get a latest download,come across another code somewhere, try to run them, and get another error message again.  Although the code may be perfectly alright.

And being a beginner, we have no idea what is happening !
The book is wrong or their is a problem with the compiler ?

Here is something regarding that.

The C language was incepted in 1972. and Turbo C compiler was introduced in 1987.

Over time, the C language standards, provided by ANSI C committee,have undergone changes. From C89, to C90 to C99 , where the digits represets the year of their inception. And every standard introduced something new, modified something old.

And over time, some compiler vendors have managed to update themselves according to the standards. Some provided backward compatibilty, while some discontinued the technical support.

As a result we have numerous compilers in the market, each following some or the other standards. Creating a huge confusion. At the same time,none of them being faulty. <overlooking the bugs>.

Its important to understand the connection between them.

for eg.
The latest Trubo C (or Turbo C++) by Borland, is the 1990 version.Which means, it can implement only the C90 standard at most. The following code, when run under Turbo C, runs perfectly.

#include<iostream.h>   // header file to include input/output stream.

but the same code when run under Dev-C++ IDE using GCC compiler, much modern IDE and compiler, give errors.
Does that mean that gcc is a faulty compiler?  Definitely not.

The gcc compiler has the liberty to implement the C99 standards, which modified the header file syntax.
The above snippet, with little modification, will run perfectly in Dev C++.

#include<iostream>   // header file to include input/output stream.

but will give error in Turbo C/C++. Again, this doesnt mean that Turbo C/C++ is a faulty compiler. Just that it doesnt have the liberty to implement C99 standards.

For beginners, i will recommend that they use Dev C++ IDE, implementing gcc compiler and use a book written according to the new standards.


.........

< I do not mean to be biased towards other compilers like Borland 5.5 and MSVC++(beware, its microsofts), but DEV C++ is my personal favourite.>

For more info on the topic. use Google.

Have fun with C !

\ character

Hello,
here we will see the use of standalone '\' or backslash character.

We take it for granted that we use the \ character with an additional character, such as n,a,b,t..etc giving rise to escape sequences, like '\a', '\n', '\b', '\t'...etc, thereby overlooking its standalone use.

Here is a demo of its use!

St1: printf("hello,
St2: world");

At first glance, this might look alright to many, since C is a free form language.
But, upon compilation, the above snippet will give an error, missing terminating ", in both the lines, because, every line requires a closing " for an opening ".

For instance, in algebra, where every opened parenthesis,(, should have a closing parenthesis,).

So! Is there any alternative?

Sure we have, thats where '\' character comes to play. thats what C is about.

The backslash character (\) is used as a continuation character, to bypass a newline.

rewriting the above snippet,

St1: printf("hello, \
St2: world");

o/p: hello, world.

1. The \ character is mainly used in cases where a line,being too long, exceeds the screen limit and you may not want to hinder readability.

2. More specifically, in Macros, when the operation sequence becomes too long.


Have fun with C !