Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Communications Encryption Government Security Your Rights Online

HTTP GZIP Compression Leaks Data On the Location of Tor Web Servers 79

An anonymous reader writes: The GZIP compression format includes a field in its header that shows the Web server's local date, at which the data was gzipped. Almost all Web servers use "zeros" to pad this field by default, citing performance issues. Around 10% of Tor site operators have removed this feature and are printing the packet's compression date. Unknown to them, this "server local date" leaks the Tor site's timezone which law enforcement can then narrow down to a specific geographical area. Coupled with other Tor protocol leaks, this could help deanonymize .onion sites.
This discussion has been archived. No new comments can be posted.

HTTP GZIP Compression Leaks Data On the Location of Tor Web Servers

Comments Filter:
  • Tor is looking more and more "holey" all the time.

    I can't help but wonder if the recent glibc DNS issue is not also an help in this deanonymization.

    It seems to me there are less and less possibilities to escape the global panopticon.

    • You thought it was the onion ring because of the layers, no it's because of the hole in the middle.
    • This is not a problem with Tor. This is the server operator failing to properly anonymize their server.

      It's like if I go and download and use the Tor Browser, but then fall victim to a phishing scam and give out personal information while using it. Tor will anonymise your connection to websites perfectly fine, but you the user are leaking information about yourself and Tor can't do anything about that. This is the same kind of issue.

    • by LWATCDR ( 28044 )

      Yea because knowing the servers location at the resolution of a timezone will help a lot...

      • by arth1 ( 260657 )

        Yea because knowing the servers location at the resolution of a timezone will help a lot...

        If the time zone is the Winamac time zone in Indiana, or some of the other very regional time zones, it may.

        It's just another datum in fingerprinting, but in some cases, it may be the crucial one.

      • by Motherfucking Shit ( 636021 ) on Monday February 22, 2016 @01:55PM (#51559897) Journal

        It could be more helpful than you think. If the server says its timezone is in the US, for example, that may be enough for a judge to grant the FBI a warrant authorizing god-knows-what attacks against it.

        • by yacc143 ( 975862 )

          Oh, the FBI doesn't need that to get a warrant nowadays, they are happily hacking foreign servers too.

    • by pr0nbot ( 313417 )

      I see TOR kind of like HTTPS: it won't necessarily keep your transmission from being decrypted and deanonymized, but it probably makes it much harder to do so. As such it just sort of raises your default level of privacy (from plain HTTP).

    • by gweihir ( 88907 )

      Bullshit. TOR security is getting fine-tuned, that is all. Of course, no sane TOR site operator would configure the correct timezone...

  • by Anonymous Coward
    Patch it to use the same time-zone (e.g. UTC+0)?
    • by The-Ixian ( 168184 ) on Monday February 22, 2016 @11:56AM (#51558681)

      Or just pad it with zero's like everything else does, apparently.

      Better to go with the flow in this case instead of trying to be clever.

      • Re: (Score:3, Informative)

        by Gr8Apes ( 679165 )
        All my servers are set to GMT. Why? Because when you're running across multiple TZs, it's a hell of a lot easier to trace logs when they use a single common global time. My activities don't care if they're in Asia/Tokyo, Europe/Berlin, Australia/Melbourne, or America/New_York, especially when services cross those regions.
        • by dAzED1 ( 33635 )
          please tell me you're kidding. Do you not know what UTC is? Does your logging software not know what UTC is? I mean fark, syslog does utc by default, is this some stupid lennart poettering thing again? Did he re-write syslog because it didn't do message queueing, and now you can't turn on the kitchen sink without logstart running, and it doesn't know what UTC is?
      • Or just pad it with zero's like everything else does, apparently.

        Even better would be to fill it with a value for a randomly selected TZ. That way you are poisoning the data, so "they" cannot be sure if any TZ fields are valid.

    • by Anonymous Coward

      Nope, use Stardates. That way the authorities need to check all of Federation Space for your hidden service.

  • by AmiMoJo ( 196126 ) on Monday February 22, 2016 @11:57AM (#51558691) Homepage Journal

    For very large values of location.

    • by Anonymous Coward

      Meta data is in the meta. The time itself may not matter, but if you combine that with native language you can probably get to the country--particularly for English, Portuguese, and many of the smaller localized languages of eastern Europe, Africa, SE Asia. The one probably most difficult is Spanish since many countries that speak Spanish are in a vertical column (adjacent timezones) in the Americas (sorry Spain)

  • by Anonymous Coward on Monday February 22, 2016 @11:57AM (#51558701)

    Relevant parts of the Gzip specification, RFC-1952:

    2.3.1
                      MTIME (Modification TIME)
                            This gives the most recent modification time of the original
                            file being compressed. The time is in Unix format, i.e.,
                            seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
                            may cause problems for MS-DOS and other systems that use
                            local rather than Universal time.) If the compressed data
                            did not come from a file, MTIME is set to the time at which
                            compression started. MTIME = 0 means no time stamp is
                            available.

    7.
    When compressing or decompressing a file, gzip preserves the
          protection, ownership, and modification time attributes on the local
          file system, since there is no provision for representing protection
          attributes in the gzip file format itself. Since the file format
          includes a modification time, the gzip decompressor provides a
          command line switch that assigns the modification time from the file,
          rather than the local modification time of the compressed input, to
          the decompressed output.

    • by unrtst ( 777550 ) on Monday February 22, 2016 @12:22PM (#51558991)

      Vote parent up.

      The article the summary references is just a summary of this: http://jcarlosnorte.com/securi... [jcarlosnorte.com]

      In which, he notes:
      Offset Size Value Description
          0 2 0x1f 0x8b Magic number to idenitfy gzip streams
          2 1 Compression method
          3 1 Flags
          4 4 Compression Date
          8 1 Compression flags
          9 1 Operating system

      He references that as coming from: http://www.forensicswiki.org/w... [forensicswiki.org]
      But that document does not say "Compression Date". It actually says:

      4 4 Last modification time. Contains a POSIX timestamp.

      Even his proof of concept shows that he's parsing that field as a POSIX timestamp: https://github.com/jcarlosn/gz... [github.com]

      echo date('l jS \of F Y h:i:s A', $rdate);

      It appears that either:

      a) Something else in his php script is setting the TZ before doing that parse
      b) The server is calculating the POSIX timestamp incorrectly, which is a similar issue but quite a different root cause.

      • by unrtst ( 777550 ) on Monday February 22, 2016 @12:32PM (#51559065)

        ... just to confirm, the answer is "b": The server is calculating the POSIX timestamp incorrectly, which is a similar issue but quite a different root cause.

        I updated his script to print the difference between the current POSIX timestamp and the value returned by the server.
        bing.com: current - server_value = 28800
        reddit.com: 0
        instragram.com: 0

        Those were his three tests. I'm not surprised the Microsoft server is the one calculating a POSIX timestamp incorrectly. MS folks tend to do timestamp math very poorly. I suspect this only affects Microsoft servers, or horribly misconfigured $anything_else.

    • undocumented gzip (Score:5, Informative)

      by TopSpin ( 753 ) on Monday February 22, 2016 @12:32PM (#51559059) Journal

      There are undocumented gzip command line switches (-m, -M) that control embedding timestamps in gzip archives. They're not mentioned in the man page or --help output, but you can see them in the source here (line 344): http://git.savannah.gnu.org/cgit/gzip.git/tree/gzip.c [gnu.org]

      #ifdef UNDOCUMENTED
      " -m, --no-time do not save or restore the original modification time",
      " -M, --time save or restore the original modification time",
      #endif

      I learned about this because I had to ensure consistent hash values of build artifacts for regulatory reasons and I believe it is a misfeature. For me the Principle of Least Surprise would have gzip produce this exact same output given the same input, by default. As it is you get a slightly different output each time you compress the same set of bits, and that is entirely down to this timestamp. I think the fact that switches to achieve that behavior exist yet are undocumented belies some conflict about this.

      • by sanvila ( 659083 )
        You don't really have to bother with undocumented features.

        In the reproducible builds effort [reproducible-builds.org], "gzip -n" is the norm.

      • I think it's because gzip only ensures the content stays the same. The archive itself can change from version to version or implementation to implementation.
      • by jrumney ( 197329 )
        I'd expect gzip to give you exactly the same output when run on exactly the same file each time. Only when you use it in a pipe (operating on stdin) should it use the current timestamp instead of the modification timestamp of the input file.
    • Why has the GZIP specification not been updated since the era of MS-DOS? Seriously?
      • Re: (Score:2, Interesting)

        by Anonymous Coward

        Why should it change?

        It's bad enough that any encrypted zip files get bullshit corruption error messages on windows XP and AES encrypted zip files get bullshit corruption error messages on newer versions of windows (see also the bullshit error messages you get when you try to use XP/IE on an HTTPS server configured with modern TLS ciphers, jesus fucking christ Microsoft, can't you just write "this file/website uses a newer, more secure protocol than this version of windows supports upgrade now to windows 10

  • So it's effectively 6 Billion divided by 24 and easily mitigated just set a different timezone if it's your server you're going in on or connect to a tor server in a different time zone.
  • by marcansoft ( 727665 ) <hector@TOKYOmarcansoft.com minus city> on Monday February 22, 2016 @12:09PM (#51558851) Homepage

    RFC1952 clearly states that the mtime header is a POSIX timestamp, i.e., it is in universal time and not local time. The author of TFA somehow either completely missed or neglected to mention the fact that, per spec, there is no leakage of the timezone, and in fact two of his examples demonstrate exactly that.

    Of the three examples cited in TFA, two of them - reddit.com and instagram.com - follow the spec and use POSIX time. Just run the php tool from TFA and you'll see that the time returned matches the current UTC time. Those servers aren't leaking their location because they follow the spec.

    Only one example - bing.com - uses something other than POSIX time. Surprise surprise, some Windows-based server - presumably IIS? - ignores the standard and leaks the timezone in the process.

    Now the question is, are people seriously running TOR hidden services on Windows machines? That just seems like asking for trouble. The operational security requirements of TOR hidden services are significantly higher than your average server, and I bet the chances of screwing that up with a Windows server are much higher. Leaking the timezone is probably the least of your worries in that case.

    TL;DR Some Windows web server mis-implements the gzip standard and leaks the local timezone in the process. Spec-compliant web servers are not affected. TFA mis-identified two compliant servers as being affected. TFA did not list any Tor hidden services that are affected to allow for confirmation. This is mostly a non-issue.

    • It makes for good television.
    • Hi marcan, nice to have you included in the discussion. I'm the author of the article. To be fair, the gzip specification was clear from the beginning. In fact I was reading this exact specification when I thought about the impact it could have on tor hidden services. The specification itself clearly states that there are potential problems with universal times under certain systems (i.e. MS-DOS at the time of the specification writing). I thought that maybe current implementations could be flawed, develo
    • by DarkOx ( 621550 )

      Still leaks information. Even if the time stamp is always in UTC, it remains possible to confirm the server is not traveling at a high enough relative relative to the requester to cause a difference in observed time.

      This is serious folks, it likely means you can in fact determine the server to be on the same planetary body or even in the same orbit!

      • The transceiver in the NIC completely obliterates any data you can gather from relativistic effects in the delays it introduces in its own media converter to go from the chip to the wire line voltages.

  • Or they can intentionally set their timezone to a different value to mislead...
    Chances are of zeroes are the default and tor sites have explicitly turned this off, then that's exactly what they've done... People running sites via tor are likely to be privacy conscious, so if they've changed a setting to a non default value they probably did it for a reason.

  • by Anonymous Coward

    Hide the offset, Use UTC exclusively

  • I usually set mine to UTC, no matter where they are.

    For this kind of leak I might "accidentally-on-purpose" select a timezone the machine doesn't happen to be in.

  • Why would you be doing a compression operation OVER AN EXTERNAL NETWORK, rather than crunching the file locally, then transmitting the compressed data over the internet. Unless you're seriously of the opinion that you can get higher actual communication speeds over a shared line servicing hundreds or thousands of other customers (compared to cables in your own wiring loom). That's to say nothing about the latencies and delays that are inherent in the Tor system itself.

One man's constant is another man's variable. -- A.J. Perlis

Working...