Islanded in a Stream of Chars

From the “things that really shouldn’t be difficult, but for some reason are anyway” department comes the following. Do you think you know how to program in C++? Familiar with objects and polymorphism and templates and everything? Then this should be dead easy. Should, I said.

Problem: Write a function that takes in a std::istream and a size n and returns a std::string. The string should contain the first n characters of the input stream, with all formatting (whitespace, newlines, etc) preserved.

You can ignore all concerns about multi-byte characters for the sake of this problem. Sounds simple, right? You’d be able to crank this out in ten seconds if someone asked you this in an interview, right? Okay, now try it with this caveat.

Caveat: You must do this in a purely C++ “style”. To be precise, you must do this without using any character variables or character arrays. Use only a std::string object (or some other memory-managed object in the standard library) as your input buffer.

For as much as the C++ STL tries to encourage you to use RAII-oriented containers instead of raw arrays, this seemingly trivial task requires some surprisingly baroque coding. If you want to test yourself, try writing the function before you click more.


As much as we’d all like it to be, the following is not the right answer:

std::string extractStr(std::istream& in, std::streamsize n)
{
  std::string str;
  in.get(str, n);
  return str;
}

The main reason that this is so much harder than it needs to be is that the istream::get() function does not provide an overload that reads directly into a string. You have only three choices if you go that route. You may either read character-by-character, or you may read into a character array, or you may read into a streambuf object. No strings for you.

A streambuf, you say! Aha! Well you may happen to remember that there is a standard class called std::stringbuf which derives from streambuf, and you could read into that and then extract the string. The problem with this, though, is that unlike the istream::get() overloads that use character arrays, the overloads that use streambufs conveniently leave out an optional size parameter. If you want to read from the stream with a stringbuf, you are obligated to read everything it has to give you, up to some delimiter. The istream class’s other unformatted data-reading functions, read() and readsome(), don’t give you any choice other than character arrays. So using an istream member function is right out.

What to do instead, then? We can turn to every C++ programmer’s best buddy, the iterator. istream objects can do more than stream extraction. Much like everything else in the C++ standard library, they have iterators. A really brute-force way to write this function is then to do this:

std::string extractStr(std::istream& in, std::streamsize n)
{
  std::string str;
  for(std::istreambuf_iterator i(in); in && str.length() < n; ++i)
  {
    str += *i;
  }
  return str;
}

Note the end-of-stream check in the conditional section of the for statement. Incrementing the end-of-stream iterator is not valid, so we have to check the istream at each iteration to make sure that it is still readable. Recall that istream objects can be implicitly converted to bool (by means of a conversion to void*), which indicates whether or not they are still good to read from.

We could add a call to string::reserve() to make the above slightly more efficient, but efficiency aside, the above function is aesthetically gross. How might we make this look a bit more elegant, and be more expressive of what we’re trying to accomplish (initializing a string with the first n characters of a stream) and no so explicitly expressive of the mechanics of how that gets done?

You might remember that std::string has a constructor which takes two input iterators and uses them to construct the string. This is a really easy way to initialize a string with the whole contents of a stream, for example if your istream in is really an ifstream and you want the entire file read into a string.

std::string str((std::istreambuf_iterator(in)), 
                std::istreambuf_iterator());

Two notes on the above: First, the parentheses around the first argument are, unfortunately, necessary. This is to prevent the parser from mis-parsing this as line a function declaration, much like how you cannot use empty parenthesis for default-constructing a variable without new. Second, the second argument, a default-constructed istreambuf_iterator is a special value which represents end-of-stream for any input stream. This specialness is why this pattern, while it works great for reading the whole stream, doesn’t work at all for reading only a fixed number of characters. What happens when you try to use this string constructor to solve the problem I initially posed?

std::string str((std::istreambuf_iterator(in)), 
                std::istreambuf_iterator(in) + n);

The compiler doesn’t like that. It will tell you that there is no operator+ defined for istream_iterators, and it will be right. Remember, istream_iterators are not models of random-access iterators. Okay, so why don’t we just then use the old standby std::advance, even if it might be a little inefficient?

std::string extractStr(std::istream& in, std::streamsize n)
{
  std::istreambuf_iterator begin(in);
  std::istreambuf_iterator end(in);
  std::advance(end, n);
  return std::string(begin, end);
}

A little prettier, but unfortunately it doesn’t work. It does compile, but it will give you an empty string at best and a segmentation fault at worst. The use of begin after we have called std::advance on end is undefined behavior. This is because istreambuf_iterators are not just not models of random-access iterators, they aren’t even models of forward iterators. They are only models of input iterators. That means that you can only move forward, and once you move forward you can never go back, even if you’ve saved a previous iterator like we did above. Input iterators only guarantee that you may pass over the range once, and that makes sense given the nature of streams.

If you look around at other standard library functions that might fit the bill instead of string‘s constructor, similar problems arise. The string object’s append() method requires a forward iterator. The std::copy() function can work with input iterators, but requires an explicit end iterator, which we can’t provide except for the special end-of-stream iterator. For some reason, unlike std::fill() / std::fill_n() and std::generate() / std::generate_n(), there is no such function as std::copy_n(). It’s almost as if the authors of the standard library are teasing us!

Just to spite the standards authors, here’s something clever you could do

std::string extractStr(std::istream& in, std::streamsize n)
{
  std::string str;
  str.resize(n); 
  in.read(&str[0], n);
  return str;
}

This will actually work, except that, strictly speaking, it is also undefined behavior. Every modern C++ compiler makes std::string‘s internal storage contiguous, so unlike the string constructor example, this will probably work in practice. But, rather surprisingly, std::string is not required to have contiguous internal storage by the current C++ standard. This will required of std::string in C++0x, and so the above will be legal in C++0x, but while it is convenient, it is currently not standards-conforming.

Sadly, as far as I can tell, there is absolutely no standards-conforming way to write this function without raw character arrays, other than by explicitly writing out the nuts and bolts of a character-by-character iteration over the istream or doing something even more long-winded like writing a wrapper around istreambuf_iterator that returns an end-of-stream iterator after a fixed number of advances. I’ll repeat the “brute force” solution (with the small optimization included) below, and if anyone can find a more elegant way to accomplish this seemingly trivial task, please post it in the comments. It’s things like this that sometimes make me think that all the C++ nay-sayers out there might be on to something.

std::string extractStr(std::istream& in, std::streamsize n)
{
  std::string str;
  str.reserve(n);
  for(std::istreambuf_iterator i(in); in && str.length() < n; ++i)
  {
    str += *i;
  }
  return str;
}

This post was inspired by this question on StackOverflow. My answer to this question in part encourages the questioner to use C++-style I/O rather than C-style I/O, but shortly after posting I realized that I did not quite know how to do what he wanted in a truly “C++ style”. I still don’t.


Share this content on:

Facebooktwittergoogle_plusredditpinterestlinkedinmailFacebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *