Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Business Failure Incubator

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


       

Please log in.
Before you can vote, you need to register. Please log in or create an account.

sha256p50

USE ONLY IF YOU KNOW WHAT YOU ARE DOING
 
(0)
  [vote for,
against]

To compute a hash, you have to consider the entire file. However, there are some non-security related use cases where you want to trade off accuracy for speed.

This is where sha256p50 comes in (p50 means probability 50%). It splits the file into blocks and considers only every second block. You get doubling of speed at the expense of certainty.

There will be other variations such as sha256p10 with 10x speedup and sha256p1with 100x speedup. Of course sha256p100 = sha256

Important thing here is that this should not be relied for in cyber security because malicious actor could just plant their malicious code within the non-scanned gaps. This is not to say that it's useless for cyber security. You can still use it as long as you understand the trade off.

t+1s sha256p1: ok

1+10s sha256p10: FAIL

t+10s sha256p100: FAIL (induced)

The user knew that the hash didn't match within 10 seconds instead of having to wait 100s. So you can detect failures right away, but you HAVE TO wait for confirmation the full 100s. You can't get into the habit of ctrl+c ing it.

It's one of those tools that would have to have a big warning on it (USE ONLY IF YOU KNOW WHAT YOU ARE DOING) .. but this being half bakery I believe I get buns for that kind of thing

ixnaum, Nov 07 2023

[link]






       What are the use cases?
pocmloc, Nov 07 2023
  

       I don't think the speedup would be linear with density, because most data storage has an access time which can be naively modelled as a constant wait and an output rate. So linear reads are much quicker than random reads.   

       You'd be better off just hashing all of the first half of the file, or whatever standard length or percentage you like. I think that's done sometimes, actually.
Loris, Nov 07 2023
  

       I came up with this when doing some large transfers (dd). Yes, I could see progress with 'pv' but I wanted to see if the data is actually getting written the way I want (because I piped it through gzip). Instead of doing the hash across the whole disk (which would take forever), I did it in beginning, middle and end to satisfy myself that it's "probably all ok"   

       I did a poor man's implementation of sha256p0.001   

       The other use case can be just impatience. If you get a lot of hash mismatches, why wait hours to get a fail when you can detect it in seconds.
ixnaum, Nov 08 2023
  
      
[annotate]
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle