department of hack
1508 stories

git diff's Deep Deception

1 Share

But try to understand
Try to understand
Try try try to understand
Git’s a magic command.

Heart  💕

I knew git stored diffs somewhere. I mean, it’s obvious—right?

All git ever shows the casual user is a diff! My pull requests were diffs. git show: diff. git diff? Duh! It’s a diff, too.

But later, I learned the truth—git’s interface belies its internals. There’s a mismatch between what git shows you vs. how git works.

It’s challenging to wield git’s interface when your mental model of the internals is broken. And after I corrected my mental model of git’s internals, I was able to stop relying so heavily on git’s truly terrible interface.

In this post, I’ll attempt to explain all the deep details of git diff to my past self.

🫠 Git add makes blobs

We can add files to repos using git add. But behind the porcelain, git’s busy compressing and storing this file deep in its bowels. Git terms the results of this process a “blob.”

Git stores blobs (among other things) inside the .git/objects directory.

$ git init
Initialized empty Git repository in /tmp/bar/.git/
$ echo "Hi, I'm blob" > foo
$ git add foo
$ tree .git/objects/
└── 26
  └── 45aab142ef6b135a700d037e75cd9f1f1c94dc

But what’s in a blob? And why is this blob stored as ./26/45aab142ef6b135a700d037e75cd9f1f1c94dc?

🗄️ Git stores things by their hash

Why did git add foo store the contents of foo as 2645aab142ef6b135a700d037e75cd9f1f1c94dc?

Git mapped our file to a number via a hash function.

A hash function maps data to a unique number (mostly)—whenever the data changes, the hash function’s output changes dramatically.

SHA1 is the hash function git uses by default. And when we git add foo git applies SHA1 to the contents of fooHi, I'm blob\n—and that spits out 2645aab142ef6b135a700d037e75cd9f1f1c94dc.

Blobs are all about content. The filename “foo” doesn’t matter at all! We could have named the file “🌈”—git still would have stored it in the same place. If the file contents are EXACTLY the same, then the hash will be exactly the same.

🌱 Git commit creates commits and trees

You already know git commit creates a commit, but what is a commit?

A commit is a type of object. Git uses the word “object” to mean: a commit, a folder or directory (tree), a file (blob), or a tag. Git stores objects in its object database—everything inside the .git/objects directory.

$ git commit -m 'Initial Commit'
[main (root-commit) 0644991] Initial Commit
1 file changed, 1 insertion(+)
create mode 100644 foo
$ tree .git/objects/
├── 06
│   └── 449913ac0e43b73bfbd3141f5643a4db6d47f8
├── 26
│   └── 45aab142ef6b135a700d037e75cd9f1f1c94dc
└── 41
  └── 81320a57137264d436b2ef861c31f430256bf4

After our commit, the object database has three objects: 06449913, 2645aab1, and 4181320a.

So now we’ve established that one of these three objects is our blob (2645aab1)—let’s see if we can suss out the others.

✨ The magic command

The magic command to learn about any object is git cat-file -p. We can use that command to find out more about our mystery objects:

$ git cat-file -p 06449913ac0e43b73bfbd3141f5643a4db6d47f8
tree 4181320a57137264d436b2ef861c31f430256bf4
author Tyler Cipriani <> 1652310544 -0600
committer Tyler Cipriani <> 1652310544 -0600

Initial Commit

This object (06449913) appears to be our commit. A commit is metadata compressed and stored inside git’s object database.

Some of the metadata is obvious, but then there’s a tree. And that tree points to our other mystery object, 418132. Let’s see what we can learn about our last remaining mystery object using our magic command:

$ git cat-file -p 4181320a57137264d436b2ef861c31f430256bf4
100644 blob 2645aab142ef6b135a700d037e75cd9f1f1c94dc    foo

So a tree is an object that stores a directory listing of objects by their SHA1s. And a commit is an object that points at a tree by recording the tree’s SHA1!

Commits point to trees, and trees point to blobs and other trees. Neat!

📈 Git’s dependency graph

So if we graphed the state of dependencies in our object database, we’d get something like this:

Simple git repo’s object dependency graph

The commit incorporates our tree, which includes our blob—everything depends on our blob!

So if we change even a single bit inside a single file: git will notice—everything is entirely traceable from the commit down to the bit level. We get this for free by hashing objects and including those hashes in other objects.

This is the whole concept of a Merkle Directed Acyclic Graph (Merkle DAG)!

🍔 So, where’s the diff?

When we type git diff, git presents us a diff. We know there are blobs and trees and commits—so where’s the diff!?

Git doesn’t store diffs anywhere at all! It derives diffs from what’s stored in the object database.

$ echo "I'm ALSO blob" > baz
$ git add baz
$ git commit -m 'Add baz'
$ tree .git/objects/
├── 06
│   └── 449913ac0e43b73bfbd3141f5643a4db6d47f8
├── 26
│   └── 45aab142ef6b135a700d037e75cd9f1f1c94dc
├── 41
│   └── 81320a57137264d436b2ef861c31f430256bf4
├── 95
│   └── 42599fac463c434456c0a16b13e346787f25da
├── 9b
│   └── 2716e4540c11e8d590e906dd8fa5a75904810a
└── e6
   └── 5a7344c46cebe61d052de6e30d33636e1cd0b4

We made a new commit, and now we have three new objects. We added a new file (blob), which made our directory different (tree), and we committed it (commit).

Our graph now looks like this:

Simple git repo’s updated object dependency graph

You might be surprised by a few things in the graph:

  • Our new commit stores its parent commit as metadata
  • Our new tree points to our old blob, and our NEW blob

So now what happens when we try git diff:

$ git diff 064499..e65a73
diff --git a/baz b/baz
new file mode 100644
index 0000000..9b2716e
--- /dev/null
+++ b/baz
@@ -0,0 +1 @@
+I'm ALSO blob

Git compares the two commits, finds their trees, sees a new blob in the second commit, and shows you the diff of /dev/null and baz.

No diffs. Just Merkle DAGs. And now you know.

Thanks to Joe Swanson for providing excellent early feedback on this post. And thanks to Kostah Harlan for reading an early draft of this post and making it less terrible. <3

Read the whole story
1 day ago
Boulder, CO
Share this story



Read the whole story
6 days ago
Boulder, CO
Share this story

I've long thought we need a fresh look at the life & legacy of Buckminster Fuller. The good news is that it looks like the job is going to be done by a writer up to the task, @nevalalee

1 Share

I've long thought we need a fresh look at the life & legacy of Buckminster Fuller. The good news is that it looks like the job is going to be done by a writer up to the task, @nevalalee…

Read the whole story
7 days ago
Boulder, CO
Share this story

Who designed the Julian calendar?

1 Share

The Roman calendar was a mess before the Romans adopted the Julian calendar in 46 BCE. It was ostensibly lunar, but months ranged from 28 days to 31 days, as they still do today, so the months didn’t line up with lunar phases. It had intercalary months to keep it in step with the seasons, but political intrigues resulted in it being nearly three months out of synch.

In 46 BCE, Julius Caesar cut through the political red tape and decreed a fixed calendar based on a 365¼ day cycle. To do this he had to draw on his powers as both dictator and pontifex maximus, or top priest. It was a phenomenal achievement. The calendar no longer even pretended to be lunar, but no one cared. It was the world’s first solar calendar. And that was even better.

Pliny the Elder claims that one of the experts Caesar employed as a consultant was an astronomer named Sosigenes. I’m here to tell you that while the Julian calendar is magnificent, Sosigenes’ role is overblown. Also, we have no idea where Sosigenes came from.

Sosigenes (Hume Cronyn) gives Cleopatra (Elizabeth Taylor) a biology lesson (Cleopatra, 1963). Perhaps she’s intrigued because he’s writing on paper, not papyrus.

We still use a slightly modified form of Caesar’s calendar today. The modification is so slight that no one alive today remembers the last time our calendar behaved differently from the Julian calendar, and hardly any of us will live to see the next time.

The new calendar was designed so that the equinoxes and solstices would fall on the same date every year. The traditional dates shown below are different from the modern ones, because the Julian calendar isn’t perfect. It’s based on a solar year of 365.25 days, but a solar year is actually 365.2422 days. As a result the solstices slip out of synch by one day every 128 years. To fix this, in 1582 the Gregorian calendar was instituted, as a slight revision to the Julian calendar. It was pinned to the equinox and solstice dates as they were in 325 CE, because 325 CE is when the Nicene formula for calculating the date of Easter was finalised.

Ancient authors give different dates for the solstices and equinoxes: the dates shown here are based on Pliny the Elder, and partly coincide with dates and intervals quoted by Columella and Ptolemy. Other authors — including Caesar — put the solstices and equinoxes on other dates: see below.

Ancient astronomers from Hipparchos onwards were aware that the solar year was slightly under 365¼ days. Ptolemy reports on two sets of observations, one made by Hipparchos, the other by himself, which calculated the error to be approximately one day every 300 years: that is, a solar year of 365.2467 days (Almagest 3.1; 204–206 ed. Heiberg). Caesar was unaware of Hipparchos’ observation, or he chose to ignore the discrepancy. Given that the true error is larger than Hipparchos or Ptolemy thought — one day every 128 years — that’s probably just as well.

Sosigenes of ... where, now?

Here’s what Pliny says about Sosigenes (Natural history 18.211–212):

... Caesar the dictator forced individual years back to the cycle of the sun, employing Sosigenes, who was an expert in his science. ... And Sosigenes himself, though more careful than others in his three treatises, did not stop questioning, since he corrected himself ...

Sosigenes appears in one other place, when Pliny cites him for the statement that the planet Mercury never appears more than 22° away from the sun (Natural history 2.39; actually the maximum elongation of Mercury varies between 18° and 28°).

Look up Sosigenes today, and you’ll often find him called ‘Sosigenes of Alexandria’.

But wait. Pause. Rewind. Take a look at Pliny, and let me remind you he’s our only source for Sosigenes. Do you see any mention of Alexandria?

No, you don’t. The idea that Sosigenes was Alexandrian is entirely a product of the modern imagination. I’ve found it in books as far back as the 1700s, so it’s not very recent, but it’s still a modern fiction.

Note. The misinformation doesn’t stop there. Wikipedia plasters fake Alexandrian and Egyptian connections all over the place, including in the article title, but also gives fake transliterations of his name into Greek — yes, his name is Greek in origin, but it isn’t attested anywhere in Greek, and he could for example be a Roman from southern Italy — and gives three totally fake titles for his lost works. A single editor invented these titles and their Greek versions out of thin air in March 2021, at the same time as adding spurious connections to the Antikythera device.

The tradition of calling Sosigenes ‘Alexandrian’ originates in indirect testimony — not about Sosigenes, but about Caesar himself.

Caesar was in Alexandria in 48–47 BCE, first hunting down Pompey, then bringing down Ptolemy XIII and setting up Cleopatra as sole pharaoh of Egypt. One mediaeval source tells us that he took a strong interest in astronomy during wartime —

(Caesar) says that it was in wartime that he focused on the study of astronomy: he put aside all other thoughts in the war. And the outcome proves that he meant this truly, since his book that he wrote about calculation is not inferior to that of Eudoxus.
Scholium on Lucan, cod. Lips. Rep. 1, N. 10 10.185 (p. 781 ed. Weber)
Note. I cannot trace Weber’s source for this scholium. ‘Lips.’ means that the manuscript is or was in Leipzig, but Weber’s edition dates to 1831, and manuscript shelfmarks at Leipzig no longer look anything like this.

And four ancient sources tell us that Caesar’s new calendar was based on ‘Egyptian teaching’.

And later, based on Egyptian teaching, Gaius Caesar appointed that the period (of a year) was 365¼ days, and that some months should have 30 days, others should have 31, and February should have 28. For in antiquity it was reckoned that each month had 30 days, and that 5¼ should be added to the total.
John Lydus, De mensibus 3.5–6 = p. 40,8–41,2 ed. Wuensch
Note. Similarly Appian, Civil war 2.154; Dion Cassius 43.26; Macrobius, Saturnalia 1.14.3, 1.16.39.

So this must be where the idea of Sosigenes being Alexandrian comes from. Even though none of these writers mentions Sosigenes, and even though no ancient writer connects Sosigenes to Egypt in any way.

Obviously that’s no basis for calling him ‘Sosigenes of Alexandria’. It’s a fabrication, and it needs to stop.

But come to that, should we even take them at their word that Caesar’s source of information was ‘Egyptian’?

According to Macrobius, Caesar drew attention to parallels in the Egyptian calendar himself. But the Julian calendar isn’t an Egyptian product in any sense. There are five reasons I say this:

  1. As we’ll see below, one key principle of the new calendar was minimal alterations to the Roman calendar. The names and positions of the months remained the same, as did the key days of the Kalends, Nones, and Ides; the extra days were distributed across the months that had fewer than 31 days; intercalation took place in the same position as in the republican calendar, that is, after 23 February; and care was taken to make sure the positions of Roman religious observances were unaltered. The Egyptian calendar, by contrast, had 12 months of equal length, with Egyptian names and 30 days each, plus 5 epagomenal (extra) days.
  2. Another key principle was the addition of one intercalary day (leap day) every four years. The Egyptian calendar didn’t have this — not until 17 years later, after the fall of the Ptolemies.
  3. The reckoning of a solar year as 365¼ days originated with an astronomer from Anatolia and based in Athens, not Egypt.
  4. Egypt’s calendar had 365 days because of physical reality, not local customs. Egypt didn’t have a monopoly on the fact that a solar year lasts roughly 365 days.
  5. We have fairly extensive documentation of Caesar’s own work on the calendar and observations of seasonal phenomena, and that he wrote a detailed treatise on the subject.

Taking these points into account, the simplest reading is that Caesar and other ancient observers drew attention to Egypt because the Egyptian calendar was already close to the correct value, and not because the new calendar was based on it.

And if you look carefully at John Lydus’ account, above, you’ll see his story is clearly not true. The calendar of ‘antiquity’ that he describes is the Alexandrian calendar after 30 BCE. Caesar can’t have got the idea of a 365¼ day calendar from Egypt, as Lydus claims, because Egypt didn’t have a 365¼ day calendar at the time.

Note. Similarly Theodor Mommsen, writing over 160 years ago; ‘Sosigenes of Alexandria’ should have disappeared from the face of the earth after he wrote (1859: 295 n.22): ‘... scholarly opinion stamped Sosigenes as Alexandrian for lack of evidence. No ancient source is going to contradict my statement: I think unbiased judges will be persuaded that much older and weightier authorities characterise Caesar’s model as the Italian-Eudoxian calendar, and that this rules out the other proposition; that in real terms it is bizarre to bring from abroad what one has long had at home; that no direct borrowing from Egypt has yet been demonstrated in the organisation of the Julian year; and that it is therefore very difficult to understand why we must ‘in any case’ accept that Caesar’s advisers were Alexandrian. The consideration that the name Sosigenes — obviously a standard Greek name, and decidedly rare — also appears on Egyptian papyrus, and that there(?) it is probably derived from the deity Shu ... is so dubious that it suffices to mention it.’

Where did Caesar’s calendar really come from?

The astronomer who measured the solar year as 365¼ days was Kallippos of Kyzikos, who studied at Plato’s Academy and Aristotle’s Lycaeum in Athens in the late 4th century BCE. Among other things, Kallippos determined exact periods between each of the solstices and equinoxes.

Previously Meton had measured a lunisolar cycle of 235 lunar months, corresponding to 19 solar years. Kallippos extended this to a cycle of 912 lunar months plus 28 intercalary months, or 27,759 days, corresponding to 76 solar years — an average of 365¼ days per year.

Note. For Kallippos’ 365¼ day year see Geminus, Phainomena 8.59–60; see further Neugebauer 1975: 615–624.

Meton’s 19-year cycle gave the year an average of 365 5/19 days, that is, 365.2632 days. Kallippos’ year of 365.25 days was an improvement on this. Later, as we saw, Hipparchos improved Kallippos’ calculation still further, to 365.2467 days.

And by the way, observe that Meton, Kallippos, and Hipparchos all lived in Greece, not Egypt. Alexandria has nothing to do with this story.

So, what should we infer: is it that Sosigenes made Kallippos’ work the basis for the Julian calendar? That’s a possible interpretation, except that it’s still missing a key fact.

You see, Caesar himself wrote a detailed treatise called De astris (‘On the stars’), on astronomy, on the length of the year, and containing a calendar with solar dates for numerous seasonal and astronomical phenomena, including solstices and equinoxes.

That is to say, it would appear that the Julian calendar is the result of Julius Caesar’s own research. He didn’t farm out the work to experts, he was the expert. When Pliny cites four schools of thought about measuring the sun’s progress around the ecliptic — the Chaldaean, Egyptian, Greek, and Italian schools — it’s Caesar himself that represents the Italian school.

The De astris doesn’t survive, alas. But the fragments, preserved in other extant sources, are collected in Alfred Klotz’s 1927 Teubner edition, and they strongly suggest that not only was Caesar knowledgeable about astronomy, his work was startlingly carefully thought out in other respects too. Caesar didn’t just lengthen the Roman year to 365¼ days. He had solid, specific reasons for the month lengths he adopted; he made his own observations of the stars and seasonal weather; he put a huge amount of effort into making the new calendar politically acceptable and into avoiding religious upsets; even more, he put careful thought into which days in each month were going to be the extra days.

Here are Caesar’s alterations:

  Republican calendar Days added Julian calendar
January 29 days 19 Jan, 20 Jan 31 days
February 28 bissextus (leap day) 28 or 29
March 31 31
April 29 26 Apr 30
May 31 31
June 29 29 Jun 30
Quinctilis/July 31 31
Sextilis/August 29 29 Aug, 30 Aug 31
September 29 29 Sep 30
October 31 31
November 29 29 Nov 30
December 29 29 Dec, 30 Dec 31
Total 355 (= 12 lunar months of 29½ days) 10¼ 365¼
Note. For the exact dates Caesar added, see Macrobius, Saturnalia 1.14.7–9. Censorinus, De die natali 20.9, corroborates the number of days added to each month; the republican-era Fasti Antiates confirm Macrobius’ figures for months in the republican calendar (Degrassi 1957: 23–41).

In both systems, intercalations — extra days to make the year line up with the seasons — were in February. In the republican system, an intercalary month could be inserted after 24 February, and in the Julian system, every fourth year the 24th of February would last two days, which Caesar called bissextus.

Incidentally, for those who already know something about the Roman calendar, the pre-Julian month lengths are the reason that ‘July, October, March, and May / have Nones the 7th, Ides the 15th day’. It’s because those are the months that had had 31 days all along; the 29-day months had them on the 5th and the 13th, and Caesar didn’t change that. The extra days he added were towards the end of each month, so as to avoid altering the dates of religious observances.

The dates of the solstices and equinoxes

The diagram I gave above is based on Pliny’s dates for the solstices and equinoxes:

(Daylight) increases from midwinter, and is equal to the night at the spring equinox, 90 days and 3 hours later. Then it exceeds the night up to the solstice, 94 days and 12 hours later. * * * up to the autumn equinox. And then, after it is equal to the daylight, the night increases until midwinter, 88 days and 3 hours later.
Pliny, Natural history 18.220

Pliny’s figure for the period between the summer solstice and autumn equinox is missing, but a period of 92.5 days is implied by the other figures and by a 365¼ day year. It so happens that Ptolemy gives two figures that agree with Pliny, including the missing 92.5 day period —

[Hipparchos] assumes that the interval from spring equinox to summer solstice is 94½ days, and that the interval from summer solstice to autumnal equinox is 92½ days ...
Ptolemy, Almagest 3.4 (trans. Toomer)

The only date Pliny pinpoints is the winter solstice, on 25 December (NH 18.221). That implies the rest of the dates used in my diagram, above. Here are some other sets of solstice and equinox dates reported in the 1st and 2nd centuries BCE and CE.

  spring equinox summer solstice autumn equinox winter solstice
Hipparchos 94.5 days after equinox 92.5 days after solstice
Caesar — (25 Mar?) 24 Jun 24 Sep
Varro 90 days after solstice (24 Mar) 45 days after Favorinus = 92 days after equinox (24 Jun) 94 days later (26 Sep) 89 days later (24 Dec)
Hyginus, Columella 25 Mar 24 Sep 25 Dec
Pliny 90.125 days after winter solstice (25 Mar) 94.5 days later (27/28 Jun) 88.125 days before winter solstice (= 92.5 days after summer solstice, i.e. 28 Sep) 25 Dec = 88.125 days later
Ptolemy 22 Mar 140 CE 24/25 Jun 140 CE 26 Sep 139 CE
Notes. Sources: Hipparchos, reported by Ptolemy, Almagest 3.4; Caesar, reported by Pliny, Natural history 18.246, 18.256, 18.312 (in book 1 Pliny cites Caesar’s De astris among his sources for book 18); Varro, Res rustica 1.28; Columella, De re rustica 9.14.1, 9.14.10–11 (citing Hyginus); Pliny, Natural history 18.220–221; Ptolemy, Almagest 3.4.

Parentheses denote information that the text does not state explicitly. In Caesar’s case, Pliny quotes the 25 March date in one sentence, then cites Caesar in the following sentence; it’s only Klotz’s edition that links the two. For Varro, the dates shown here are based on the inference that his date for Favonius (the west wind) coincides with the beginning of spring, which Varro puts on 7 February. See above on the 92.5-day period in Pliny.

Some dates are widely mistranslated and/or misreported. In the Loeb translation of Columella, viii calendas Aprilis is mistranslated as 24 Mar. (for 25 Mar.), and viii calend. Ianuarii as 23 Dec. (for 25 Dec.); in the Loeb of Pliny, viii kal. Ian. is mistranslated as 26 Dec. (for 25 Dec.). In addition, Columella is widely reported as putting the winter solstice on 24 Dec., even by Neugebauer. I cannot tell how these errors have arisen. The text in each case is clear and unambiguous.

Ptolemy’s measurements are a day later than modern reckoning, which gives the dates as 5:27 pm 21 Mar. 140 CE, 3:11 pm 23 Jun. 140 CE, and 10:41 pm 24 Sep. 139 CE (= Terrestrial Time plus 2 hours for the longitude of Alexandria).

There’s a substantial body of scholarship over these dates, but mostly about which coordinate system the ancient authors use for setting the points and divisions of the seasons against the sun’s progress through the zodiac (Neugebauer 1975: 593–600), and on what the above table may have looked like in the republican calendar (Mommsen 1859: 54–79 on the rustic calendar, tabulation at 62).

There is certainly more variation than you might think from looking at Christian traditions about the dates of Easter and Christmas. Christians from the 3rd century onwards consistently used the dates 25 March and 25 December for the spring equinox and winter solstice, as I described last year.

Those are the dates we find in Pliny, Columella, and (so Columella tells us) Hyginus, the most celebrated Roman astronomer of Augustus’ time. It’s striking that in Pliny these dates, which disagree with Caesar, are stuck in between two discussions of Caesar.

NH 18.210–211 Caesar’s solar cycle; Sosigenes’ involvement
NH 18.220–221 Pliny’s solstice and equinox dates (which disagree with Caesar)
NH 18.232 onwards Lengthy catalogue of seasonal and astronomical phenomena, including some of Caesar’s solstice and equinox dates

Pliny’s solstice and equinox dates are of great interest. Given that they partially correspond to what Ptolemy says about Hipparchos, they could be derived from Hipparchos.

Actually I suspect they might be Kallippos’ own dates. Remember that the Julian calendar slips out of synch with the seasons by 1 day every 128 years. As a result, when ancient writers give us calendar dates for solstices, we can estimate when those dates were observed. We compare the quoted dates with the dates as calculated by modern astronomy, and see which period they’re valid for.

We do need to allow leeway. Ancient observers made mistakes measuring the equinoxes and solstices: Ptolemy himself erred by a day in 139–140 CE (see notes to table above). With that in mind, here are the date ranges where the quoted dates are valid, to varying degrees of tolerance.

Source Dates quoted Valid period, dates exactly as quoted Valid period, ±1 day tolerance Valid period, ±2 days tolerance
Caesar 24 Jun, 24 Sep 12 to 167 CE 121 BCE to 287 CE 253 BCE to 407 CE
Varro 24 Mar, 24 June, 26 Sep, 24 Dec no valid period 165 BCE to 51 CE 289 BCE to 179 CE
Hyginus, Columella 25 Mar, 24 Sep, 25 Dec no valid period no valid period 261 to 26 BCE
Pliny 25 Mar, 27/28 Jun, 28 Sep, 25 Dec 429 to 298 BCE 557 to 162 BCE 689 to 26 BCE

Caesar’s and Varro’s dates could plausibly have been observed in their own lifetimes. But in Pliny’s case, that’s quite a stretch. Pliny’s dates look as though they come from observations made much earlier.

It has been suggested that the 25 March/25 December dates could come from Hipparchos (thus Hannah 2005: 151, following a suggestion of Christian Ludwig Ideler in the 1820s). I think the full set of Pliny’s dates points to an earlier origin. And, given that Kallippos, the discoverer of the 365¼ day cycle, lived slap in the middle of the valid date range for Pliny, I’m going to suggest that Kallippos could also be the originator of Pliny’s season lengths and solstice dates.


  • Degrassi, A. 1957. Inscriptiones latinae liberae rei republicae, vol. 1. Florence.
  • Hannah, R. 2005. Greek & Roman calendars. Constructions of time in the classical world. London.
  • Klotz, A. (ed.) 1927. ‘vii. De astris.’ In: C. Iuli Caesaris commentarii, vol. 3. Teubner. 211–229. [Internet Archive]
  • Mommsen, Th. 1859. Die römische Chronologie bis auf Caesar, 2nd ed. Berlin. [Internet Archive]
  • Neugebauer, O. 1975. A history of ancient mathematical astronomy. Berlin/Heidelberg.
Read the whole story
7 days ago
Boulder, CO
Share this story

Unix command line conventions over time

1 Share

This blog post documents my understanding of how the conventions for Unix command line syntax have evolved over time. It’s not properly sourced, and may well be quite wrong. I’ve not been using Unix until 1989, so I wasn’t there for the early years. Maybe someone has written a proper essay on this, with citations. I’m too lazy to dig them up.

Early 1970s

In the beginning, in the first year or so of Unix, an ideal was formed for what a Unix program would be like: it would be given some number of filenames as command line arguments, and it would read those. If no filenames were given, it would read the standard input. It would write its output to the standard output. There might be a small number of other, fixed, command line arguments. Options didn’t exist. This allowed programs to be easily combined: one program’s output could be the input of another.

There were, of course, variations. The echo command didn’t read anything. The cp, mv, and rm commands didn’t output anything. However, the “filter” was the ideal.

$ cat *.txt | wc

In the example above, the cat program reads all files with names with a .txt suffix, writes them to its standard output, which is then piped to the wc program, which reads its standard input (it wasn’t given any filenames) to count words. In short, the pipeline above counts words in all text files.

This was quite powerful. It was also very simple.


Fairly quickly, the developers of Unix found that many programs would be more useful if the user could choose between minor variations of function. For example, the sort program could provide the option to order input lines without consideration to upper and lower case of text.

The command line option was added. This seems to have resulted in a bit of a philosophical discussion among the developers. Some were adamant against options, fearing the complexity it would bring, and others really liked them, for the convenience. The side favoring options won.

To make command line parsing easy to implement, options always started with a single dash, and consisted of a single character. Multiple options could be packed after one dash, so that foo -a -b -c could be shortened to foo -abc.

If not immediately, then soon after, an additional twist was added: some options required a value. For example, the sort program could be given the -kN option, where N is an integer specifying which word in a line would be used for sorting. The syntax for values was a little complicated: the value could follow the option letter as part of the same command line argument, or be the next argument. The following two commands thus mean the same thing:

$ sort -k1
$ sort -k 1

At this point, command line parsing became more than just iterating over the command line arguments. The dominant language for Unix was C, and a lot of programs implemented the command line parsing themselves. This was unfortunate, but at this stage the parsing was still sufficiently simple that most of them did it in sufficiently similar ways that it didn’t cause any serious problems. However, it was now the case that one often needed to check the manual, or experiment, to find out how a specific program was to be used.

Later on, Wikipedia says 1980, the C library function getopt was written. It became part of the Unix C standard library. It implemented the command line parsing described above. It was written in C, which at that time was quite a primitive programming language, and this resulted in a simplistic API. Part of that API is that if the user used an unknown option on the command line, the getopt function would return a question mark (?) as its value. Some programs would respond by writing out a short usage blurb. This led to -? being sometimes used to tell a program to show a help text.

Long options

In the late 1970s Unix spread from its birthplace, Bell Labs, to other places, mostly universities. Much experimentation followed. During the 1980s some changes to command line syntax happened. The biggest change here was long options: options whose name wasn’t just a single character. For example, in the new X window system, the -display option would be used to select which display to use for a GUI program.

Note the single dash. This clashed with the “clumping together” of single character option. Does -display mean which display to use, or the options -d -i -s -p -l -a -y clumped together? This depended on the program and how it decided to parse the options.

A further complication to parsing the command line was that single-dash long options that took values couldn’t allow the value to be part of the same command line argument. Thus, -display :0 (two words) was correct, but it could not be written as -display:0, because a simple C command line parser would have difficulty figuring out what was the option name and what was the option’s value. Thus, what previously might have been written as a single argument -d:0 now became two arguments.

The world did not end, but a little more complexity had landed in the world of Unix command line syntax.

The GNU project

The GNU project was first announced in 1983. It was to be an operating system similar to Unix. One of the changes it made was to command line syntax. GNU introduced another long option syntax, I believe to disambiguate the single-dash long option confusion with clumped single-character options.

Initially, GNU used the plus (+) to indicate a long option, but quickly changed to a double dash (--). This made it unambiguous whether a long option or clumped short options were being used.

I believe it was also GNU that introduced using the equals sign (=) to optionally add a value to a long option. Values to options could be optional: --color could mean the same as --color=auto, but you could also say --color=never if you didn’t like the default value.

GNU further allowed options to occur anywhere on the command line, not just at the beginning. This made things more convenient to the user.

GNU also wrote a C function, getopt_long, to unify command line parsing across the software produced by the project. I believe it supported the single-dash long options from the start. Some GNU programs, such as the C compiler, used those.

Thus, the following was acceptable:

$ grep -xi *.txt --regexp=foo --regexp bar

The example above clumps the short options -x and -i into one argument, and provided grep with two regular expression patterns, one with an equals, and one without.

The GNU changes have largely been adopted by other Unix variants. I’m sure those have had their own changes, but I’ve not followed them enough to know.

GNU also added standard options: almost every GNU program supports the options --help, --version, and --mail=ADDR.1

Double dash

Edited to add: Apparently the double-dash was supported already in about 1980 in the first version of getopt in Unix System III. Thank you to Chris Siebenmann.

Around this time, a further convention was added: an argument of two dashes only (--) as a way to say that no further options to the command being invoked would follow. I believe this was another GNU change, but I have no evidence.

This is useful to, say, be able to remove a file with name that starts with a dash:

$ rm -- -f

For rm, it was always possible to provide a fully qualified path, starting from the root directory, or to prefix the filename with a directory—rm ./-f—and so this convention is not necessary for removing files. However, given all GNU programs use the same function for command line parsing, rm gets it for free. Other Unix variants may not have that support, though, so users need to be careful.

The double dash is more useful for other situations, such as when invoking a program that invokes another program. An example is the cargo tool for the Rust language. To build and run a program and tell it to report its version, you would use the following command:

$ cargo run -- --version

Without the double dash, you would be telling cargo to report its version.


I think at around the late 1980s, subcommands were added to the Unix command line syntax conventions. Subcommands were a response to many Unix programs gaining a large number of “options” that were in fact not optional at all, and were really commands. Thus a program might have “options” --decrypt and --encrypt, and the user was required to use one of them, but not both. This turned out to be a little hard for many people to deal with, and subcommands were a simplification. Instead of using option syntax for commands, just require commands instead.

I believe the oldest program that uses subcommand is the version control system SCCS, from 1972, but I haven’t been able to find out which version added subcommands. Another version control system, CVS, from 1990, seems to have had them the beginning. CVS was built on top of yet another version control system, RCS, which had programs such as ci for “check in”, and co for “check out”. CVS had a single program, with subcommands:

$ cvs ci ...
$ cvs co ...

Later version control systems, such as Subversion, Arch, and Git, follow the subcommand pattern. Version control systems seem to inherently require the user to do a number of distinct operations, which fits the subcommand style well, and also avoids adding large numbers of individual programs (commands) to the shell, reducing name collisions.

Subcommands add further complications to command line syntax, though, when inevitably combined with options. The main command may have options (often called “global options”), but so can subcommands. When options can occur anywhere on the command line, is --version a global option, or specific to a subcommand? Worse, how does a program parse a command line? If an option is specific to a subcommand, the parsing needs to know which subcommand, if only so it knows whether the options requires a value or not.

To solve this, some programs require global options to be before the subcommand, which is easy to implement. Others allow them anywhere. Everything seems to require per-subcommand options to come after the subcommand.


The early Unix developers who feared complexity were right, but also wrong. It would be intolerable to have to have a separate program for every combination of a program with options. To be fair, I don’t think that’s what they would’ve advocated: instead, I think, they would’ve advocated tools that can be combined, and to simplify things so that fewer tools are needed.

That’s not what happened, alas, and we live in a world with a bit more complexity than is strictly speaking needed. If we were re-designing Unix from scratch, and didn’t need to be backwards compatible, we could introduce a completely new syntax that is systematic, easy to remember, easy to use, and easy to implement. Alas.

None of this explains dd.

  1. The --email bit is a joke.↩︎

Read the whole story
7 days ago
Boulder, CO
Share this story

Saturday Morning Breakfast Cereal - Normal

1 Comment and 5 Shares

Click here to go see the bonus panel!

I can draw perspective JUST FINE, thankyou, it's just two giant women behind a tiny man, and the guy in the back is standing on 4 crates of oranges.

Today's News:
Read the whole story
7 days ago
_Contact_ it ain't.
Boulder, CO
14 days ago
Washington, DC
Share this story
Next Page of Stories