The existing answers already explain that telnet can do more than just stream input data over TCP unchanged; I'd like to instead focus on the "My dumb chat client program doesn't create a [application] protocol." -part of the question. (As an aside, for actual "raw data over TCP", you can use netcat instead of telnet)
Just because a protocol is simple or bad - or not even well-specified anywhere - doesn't mean it's not a protocol. Two systems can't communicate without having an agreed-upon protocol, even if that protocol is just "raw data over TCP". If you try speaking FTP on one end and HTTP on the other, you're going to have problems, one way or another - if you're lucky, it won't work; if you're not, you'll get wrong behaviour. (It at least used to be the case that if you ran an FTP daemon on a non-standard port on the same host as an HTTP server, on some browsers you could use that to perform XSS and bypass CSRF protections by pointing the victim's browser to POST to the FTP server, which would echo the POSTed content in error messages, and the browser would interpret it as a HTTP/0.9 response - and happily run any javascript.)
It is a perfectly valid question to ask "What protocol does that thing you call my dumb chat client program use? I want to talk to it from my program." and one possible answer would be "raw data over TCP". It's not an exhaustive answer, however.
- If you have some default port, that should probably be part of the answer.
- If you wanted to be extremely clear, you could specify that you mean "normal TCP data, not TCP urgent/out-of-band data" (which is a bit of a stretch, since it wouldn't make any sense, and judging by the amount of people I've seen that think the third fd set of
select() is for errors, the majority of people don't even seem to know out-of-band data exists - but even if it's bit of a bad example, I hope you see what I'm getting at)
- Even things you might not even have thought about might implicitly be part of the protocol: How do you close the connection? Do you just call
close() or do you use shutdown() to make sure all pending data is first delivered?
You said "open a TCP stream socket for raw input/output? That's not a rule.", but is that really so? "You must use TCP", "you must send the data as-is" and "you should use port XXX" sound like rules to me.
(P.S. I got a bit swept up in writing this; The rest of this answer is a bit tangential, and tries to answer the natural follow-up question of "Okay, so everything is a protocol - does that have any implications?")
So, keeping in mind that every time you make two systems talk to each other, you're either using an existing protocol or creating a new protocol: Document your protocols before it's too late! (Ie. at the latest, when the protocol grows beyond simple, or starts having more users than just the developers) Eg. many early P2P-protocols said "the source of the official client is the documentation", which is extremely annoying in a complex project, and just leads to clients speaking subtly different protocols and being incompatible here and there.
There's the old adage of "be strict in what you produce and liberal in what you accept" - the core sentiment is nice, but frankly, at least the way it tends to be applied, I think that it's bullcrap, and only leads to more problems in maintainability and compatibility. I say: Specify in detail (try to think of every corner case), and be extremely strict in what you produce and accept, and you'll likely prevent/solve real bugs.
An example, about strictness solving bugs: Years ago, a certain bittorrent tracker began having a problem: Some of their torrents didn't work on some clients; the clients didn't complain, they just got a different infohash, and thus couldn't find the torrent. They couldn't figure out what was going on, and declared that the affected clients must be buggy - stop using them. Bittorrent uses bencoding, which is very strictly and well specified - and for a good reason: Same input always results in the exact same encoded output string, so that the [info]hash is always same for the same input data. When I first bumped into one of those torrents, I tried feeding it into my own - very strict - parser, and straight away it said: Error: Bencoded dictionary keys not sorted (last='sha1', key='private').
So, what happened? The tracker started automatically adding the key "private" to each torrent that was uploaded, but just added it to the end, instead of keeping the dictionary keys sorted, like bencoding specifies - it wasn't strict in what it produced. Some clients decoded the the torrent only partially, and calculated the hash for the undecoded, unaltered string. Some clients decoded the whole torrent, and re-encoded the part they needed to hash, sorting the keys correctly in the process (creating a valid bencoding), and got a different hash. To my knowledge/recollection, no client complained that the data was invalid - the clients weren't strict in what they accepted. So, which hashes were correct? Neither! Both ways of calculating the hash are correct, but the torrents were corrupted! If even one of the clients would've complained about it, the bug would've likely been fixed the same day, instead of it taking months. (IIRC I also reported an identical bug in another completely unrelated tracker some years later; the software of the first one was made in-house, so it wasn't the same codebase either)
Another example, this time about corner cases: The eDonkey2000 P2P-protocol used a custom hash called ed2k; The specification didn't specify the behaviour in the corner-case of input data length being exactly divisible by the block size, so two incompatible variants were born, which produce a different hash for some files, and are incompatible. (I said "specification", but since this was an early P2P-protocol, I'm pretty sure there wasn't a specification; the special-case was probably missed when reverse-engineering the protocol)
About the need for detailed specifications: Many times, when trying to write a client for a protocol or a parser for a file format, I've encountered a situation where even the original programmers have no idea how some data is actually encoded, because they've used some library, and have never checked in full detail what it actually does. Two examples:
A certain API used a custom UDP protocol, with encrypted [variable-length] packets - the documentation specified AES128, but didn't mention the used padding. When the developers were queried about it, the response was along the lines of:
Uh, we actually have no idea, we just used this function in this Java library, and the documentation doesn't seem to mention which padding it uses
The website of a certain device has this to say about interfacing with their savefiles:
It is not possible to create or modify a saved *.logicdata files. That is because the files are generated with boost binary serialization and contain a very complex object structure. In truth, we don't even understand how it works. Boost serialization is extremely complex.
(Note: In reality, though, it doesn't take that long to reverse-engineer enough of the file format to be able to export the data)
Some of the examples were about encodings, file formats or custom algorithms (though still used inside protocols, except for the last one), but I'd say all of this applies to all of those. No matter if you're talking about a protocol, a file format, an encoding or a custom algorithm:
- Write a good specification / document it well
- Try to think about all the corner-cases
- Try to include all the needed details
- Follow the specification to the letter / be strict