Log in

No account? Create an account
color cycle (slow)

Kistaro Windrider, Reptillian Situation Assessor

Unfortunately, I Really Am That Nerdy

Previous Entry Share Next Entry
color cycle (slow)
Fellow C programmers, you'll understand this immediately...

I read the documentation for gets(char*) for the first time today; I'd always used other functions and never bothered looking up gets before. I just finally had to read all of stdio.h, and found out exactly what it does.

I think, even through the thick walls of the dorm, my neighbors heard my scream.

I'm still too traumatized to explain it. Other programmers here are invited to explain to the rest of the audience exactly why my reaction is perfectly understandable.

edited to add:
Perhaps the GNU documentation on gets(char*) says it best...
Never use gets(). ...

  • 1
*grin* The function that has caused more buffer overruns than any other function. Crackers know it and love it well.

I checked just for fun. I figured that gcc had __attribute__ ((deprecated)) gets, the gcc extension that causes a warning whenever the function is used. Trying to compile something with gets gave me an amusing and pleasant surprise- rather than __attribute__ ((deprecated)), it's been granted __attribute__ ((poison)), which makes usage of gets an error fatal to compilation.



I'm amused by the description on www.cplusplus.com:

"There is no limit on how many characters gets may read, so it's your job to determinate the length of the buffer you will need."

Last I knew, determine was a word, but I've never heard of "determinate" before.

Determinate is a word, but it's not a verb. It's an adjective, so it's being misused there. An example of a correct use would be something like. "There are a determinate number of combinations from rolling two six-sided dice."

Yeah ... I would evidently have been thinking in verb mode and not just in general.

Wow. I actually did wake up this morning when my alarms went off and really did post that second comment. The scary thing is that I didn't know for certain until I checked.

Further clarification on my two sleep-fogged comments: Seeing "determinate" used as though it were "determine," I was hearing it in my head as determine + ate, which bears little to no resemblence to the actual adjective, and I was only thinking of verbs anyway, as well as analyzing it based on my intro to linguistics class (or at least as much as my half-asleep brain could manage).



gets() actually had a use a long time ago. If you have your compiler rigged to associate a particular char pointer with a special output device then gets() is a fast way to write to it. Presumably one cannot overrun the output device.

Nowadays that kind of trickery is no longer worth any speed gain it would give.


That's not actually true. It would be true if the char* was not automatically incremented as part of gets' behaviour, but its intent is to write a string, not to write to a particular target register (see also Duff's Device). gets() is just a trivial get char from input, put char to output, increment output pointer loop- but that does mean the output pointer is getting incremented. So the first character would hit the target, and the next one? Well, whoopsie.

Quite a few C functions have that recommendation.

strtok / strtok_r:
Never use these functions. If you do, note that: ...

This strongly reminds of of the UNIX-HATERS Handbook. Surely a good programmer would fix the bugs, not just write some note hidden in the documentation and release it anyway.

Well, note my reply to chipuni: it's been as fixed as it's possible to get, because a program using gets will fail to compile.

The problem with gets is inherent to the function prototype and is not anything that can be fixed by any amount of code; instead, other functions (fscanf, for one) do its job without the problem.

gets takes a pointer to a C-style string (a char array) as its sole argument; it reads until the next end-of-line from the console into the string. That doesn't sound so bad until you realize that gets has no way of knowing how much space you allocated for the string, and will cheerfully keep writing as long as it's getting non-newline characters. In Java, that would be an ArrayIndexOutOfBoundsException. In C (or C++), it's much worse. Array bounds are not checked in those languages- an array is actually just a pointer to the first element, and pointer arithmetic is used to figure out where the nth element of the array is when requested- and it will cheerfully give you a reference to some other variable in memory if you ask for a number beyond the size of the array. Or a negative number, for that matter. If the space isn't allocated to the program, the program will halt abruptly with a segmentation fault (SIGSEGV) on a POSIX system, a general protection fault under Windows XP, and won't stop at all and will just cause the system to behave in an undefined manner for most earlier distributions of Windows. (NT might be okay; I'm not sure what its memory model looks like.)

gets doesn't know when to quit and can't be told when to quit. It's a bug inherent to the design of the function, and I'd say it's quite right that a program using gets won't even compile. A carefully-written string beyond the buffer size- gets can't dynamically allocate it, after all, it's not getline- can overwrite little things like the return address of the current stack frame, causing the instruction pointer to jump onto the stack to the rest of the string, executing whatever code is in the input string interpreted as machine instructions. This is the heart of a buffer overflow attack.

This was actually the third lab I had in CS 361: plot and execute a buffer overflow attack against a program that uses gets. (We were given the program, obviously.)

Ah, yes.. that's a general problem with memory management, storing variables of unknown length? Pretty much the only thing to be done is to allocate more memory than you think will be needed, and throw an error if it won't fit.

I see what you mean about gets. Although it's wild speculation, I would think that NT 4 would behave similarly to Windows XP, since that's based on NT's architecture? Of course, the latest versions of Windows (and other OSes) have support for the hardware NX bit which could prevent buffer overruns being major security vulnerabilities.

As an aside, while I was working for Elekta I was asked to write a couple of test tools. I only had access to Visual Studio.NET, so took some time learning about .NET, but the tools were pretty easy. One, to create files to fill up the disk, has already been superseded by a C++ program which simply allocates a file of the required size, while mine actually writes out data at about 7MB/s, making it much slower. My audit trail viewer works well though, and .NET makes it very easy to connect to an SQL database and pretty easy to put it in a table with automatic sorting support, although there are too many objects called Data* (DataSet, DataTable, DataGrid, DataView...).

The problem is that .NET programs are not allowed to run across the network (including from a drive you yourself have mapped on the LAN) if they do anything relating to the computer they're running on, apart from using Windows.Forms widgets. That includes read-only querying for free space, so the file writing that both programs do is definitely not allowed. If run on a machine with VS.NET, you get a somewhat helpful error message about being unable to grant permissions, but on a machine just with the .NET framework runtime, it throws an exception. This is a security feature which can only be overridden by manually going to the .NET configuration on the client computer.

.NET programs are not allowed to run across the network

I find that very ironic somehow.

I believe you are correct that Windows NT has a segmented memory model and will probably behave like XP.

You are not correct that the only way to deal with that sort of situation is to guess "enough" and fail if it is not, indeed, enough. getline deals with it with a very clever recursion. As I understand it, getline declares an on-the-stack buffer of n characters. When n isn't enough, it calls itself with its size_t parameter incremented by one- the parameter defaults to zero and that shouldn't be touched when the function is called. When something finally hits a newline, it looks at the parameter, multiplies by n, adds the number of characters that it itself found, allocates just enough, and then starts writing to the buffer backwards from the end.

Clever, no? All of it with only one call to malloc.

That is indeed a clever solution to the problem. Perhaps a problem on earlier machines with limited memory, but fine now.

All large string manipulation is expensive, but you're right that getline is particularly smelly; it requires memory approximately equal to twice the size of the string...

  • 1