This is a nice dataset, thanks for making it available.<br />I noticed (taking the 20100811 archive) that in the &quot;arin&quot; subdirectory, there are both the unpacked files and two versions of sub-archives, and I verified that these are completely redundant and can be removed with no loss of information. That would reduce the size of the entire archive significantly:<br /><br />-rw-r--r-- 1 leinen wheel 75469506 11 Aug 18:37<br />-rw-r--r-- 1 leinen wheel 192364964 11 Aug 08:41<br />