I understand the difference between the two so there's no need to go into that, but I'm just wondering what the reasoning is behind why Windows uses both CR and LF to indicate a line break. It seems like the Linux method (just using LF) makes a lot more sense, saves space, and is easier to parse.
-
5Newline#Historyuser142162– user1421622011-06-29 13:50:30 +00:00Commented Jun 29, 2011 at 13:50
-
It may be worth noting that CRLF on Windows is mostly just a convention/default. Most programs support either (though you might have to mess with the settings). I personally almost never use CRLF, opting instead for the UNIX-style LF; only a handful of programs still have problems with files that just use LF.Kevin– Kevin2017-02-15 15:11:59 +00:00Commented Feb 15, 2017 at 15:11
-
4CR+LF is the correct way to do it (it is the standard), so the question isn't why Windows does it correctly but why Mac and Unix/Linux do it incorrectly. Standalone LF's legacy is laziness and taking a shortcut. I always CR+LF, except for certain Linux things that gawk at CR+LF so I change to LF mode for that. IMO, misinterpreting CR+LF is a lot worse than misinterpreting a standalone LF.InterLinked– InterLinked2020-04-15 14:46:59 +00:00Commented Apr 15, 2020 at 14:46
-
That Newline#History article seems to suggest that CR+LF is the standard according to ASA. The ISO standard seems to support both LF and CR+LF. So I guess life is more nuanced @InterLinked :)chhabrakadabra– chhabrakadabra2020-07-22 15:39:30 +00:00Commented Jul 22, 2020 at 15:39
-
1@TwistedCode Indeed, I do use CR without LF in some of my own programs. It's useful to go back to the beginning of the line without going to the next one. They usually go well together, but each can be used on its own. CR on its own is more useful than LF on its own thoughInterLinked– InterLinked2021-09-04 11:28:59 +00:00Commented Sep 4, 2021 at 11:28
6 Answers
Historically when using dot-matrix printers teletypes CR would return the carriage to the first position of the line while LF would feed to the next line. Using CR+LF in the file themselves made it possible to send a file directly to the printer, without any kind of printer driver.
Thanks @zaph pointing out it was teletypes and not dot matrix printers
6 Comments
@sshannin posted an URL from Raymond Chen's blog, but it doesn't work anymore. The blog has changed its internal software, so the URLs changed.
After crawling through the old posts in the new blog I've found it here.
Quote from the blog:
Why is the line terminator CR+LF?
This protocol dates back to the days of teletypewriters. CR stands for “carriage return” – the CR control character returned the print head (“carriage”) to column 0 without advancing the paper. LF stands for “linefeed” – the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.
If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you’ll see that they all specify CR+LF as the line termination sequence. So the the real question is not “Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?” but rather “Why did other people choose to differ from these standards documents and use some other line terminator?”
Unix adopted plain LF as the line termination sequence. If you look at the stty options, you’ll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where
each line beginswhere the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.
The unix ancestry of the C language carried this convention into the C language standard, which requires only “\n” (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.
The C language also introduced the term “newline” to express the concept of “generic line terminator”. I’m told that the ASCII committee changed the name of character 0x0A to “newline” around 1996, so the confusion level has been raised even higher.
Here’s another discussion of the subject, from a unix perspective
I've changed this second link to a snapshot in The Wayback Machine, since the actual page is not available anymore.
I hope this answers your question.
13 Comments
It comes from the teletype machines (and typewriters) from the days of yore.
It used to be that when you were done typing a line, you had to move the typewriter's carriage (which held the paper and slid to the left as you typed) back to the start of the line (CR). You then had to advance the paper down a line (LF) to move to the next line.
There are cases you might not have wanted to linefeed when returning the carriage, such as if you were going to strikethrough a character with a dash (you'd just overwrite it).
But basically, it boils down to convention. DOS used the full CR/LF convention, and UNIX shortened it a bit. Now we're stuck!
Comments
From Wikipedia:
The sequence CR+LF was in common use on many early computer systems that had adopted teletype machines, typically an ASR33, as a console device, because this sequence was required to position those printers at the start of a new line.
Comments
Line ending standards developed decades ago alongside the development of hardware (e.g. printers), operating systems, communication protocols, languages like C, etc. From a modern perspective, it would have preferable for one standard to have "won". But now we are stuck. Fortunately modern applications, like editors, source repository tools, etc. generally handle interoperability well.
As others point out, Windows is not unique in choosing CRLF over a single character line ending like LF or CR.
For compatibility with existing systems when DOS was developed in the early 80s, CRLF was chosen.
- CR has the semantic meaning of "move the cursor to start of line" (aka pressing HOME on your keyboard)
- LF has the semantic meaning of "move the cursor down one line" (aka pressing the DOWN arrow)
- CRLF therefore means "move to the start of a new line"
We can implicitly add CR before LF-only text, or LF after CR-only text, but its easy to forget that CR or LF alone is actually ambiguous.
*NIX systems are beautiful, the rightful OS of choice for many developers. This is a rare example where Windows favoured clarity of intention over brevity...and made the better design choice.
Comments
I have seen more than one account to the effect that the reason to send two characters (and sometimes more) instead of one was in order to better match the data transfer rate to the physical printing rate (this was a long time ago). Moving the print-head took longer than printing a single character and sending extra characters was a way of preventing the data transfer from getting ahead of the printing device. So the reason we have multiple characters for end-of-line in Windows is basically the same as the reason we have QWERTY keyboards -- it was intended to slow things down.
Obviously the reason this practice continues in Windows to this day is based on some notion of ongoing backwards compatibility, and ultimately, just simple inertia.
Of note however, this convention is not strictly enforced by Windows at the operating system level. Any Windows application is free to ignore the convention, depending on what other applications it is trying to be compatible with.
Interestingly, the Wikipedia article about "Newline", claims that Windows 8 may introduce a change to using only LF.