Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Flaky rehab

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


                     

Almost uncorrupted files

Now compression tools are powerful enough, bio-engineer the ASCIent code!
  (-3)
(-3)
  [vote for,
against]

ASCII code translates numbers in the range 0-255 into symbols (most of which are used seldom).

The genetic code (used by most living beings) translates DNA bases (4-letter alphabet, namely: A, C, G, T) into aminoacids (20-letter alphabet of proteins).

4x1 = 4. No use. 16 aminoacids wouldn't be encoded. 4x4 = 16. Humpf! Almost there. Still 4 'unspeakable' aminoacids. 4x4x4 = 'base triplet' or 'codon' = 64. More than enough!! The technical word for 'more than enough' is 'degenerate'. Some aminoacids are encoded by a single codon, some of them by two codons, etc, up to six.

This means the first base in the triplet is 'essential', while the second and the third one are often 'redundant'. They may accept a mutation and still code for the same aminoacid.

Imagine doubling the necessary numbers in the standard ASCII code, from 256 to 512. Useless! A waste of memory/disk room! Yeah, all of that, and more. You don't want a binary 100 0111 (capital G) single-bit corrupted into a 101 0111 (capital W), and you definitely won't accept a 100 1000 -> 100 0000 (H -> @) change in your texts. There would be several good rules to apply to the new code, such as providing highly used characters with more (synonymous) binary numbers. For compression purposes, efficiency loss could be minimized by choosing the appropriate encoding so that code redundancy is put to use. If you can't compress the first binary number for character C1 along with the first binary number for chararcter C2, then try alternative encondings until compression is optimal or time cost is unaffordable.

Double the amount of needles and the amount of hay in the stack: any magnet will be able to pulls both needles in less than twice the required time for a single one.

mayihave, Jul 25 2007

Wikipedia: Forward error correction http://en.wikipedia...ror-correcting_code
[jutta, Jul 26 2007]


Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.



Annotation:







       As far as I'm concerned computers are and will remain absolutely useless until they learn to perform internal operations in plain English.
nuclear hobo, Jul 25 2007
  

       my humpf! my humpf my humpf my humpf!
bungston, Jul 25 2007
  

       ASCII what you can do for your computer, not what your computer can do for you.
xenzag, Jul 26 2007
  

       I just had wonderful mental images of computer internals of various nationalities.   

       "No, don't be interrupting me, listen what I'm telling, this is a matter for the accountant only"   

       Disclaimer: Any nationality assumed from the above text is the reader's responsibility.
marklar, Jul 26 2007
  

       Really interesting link, the one about forward error correction. Thank you for that.   

       Humm, as for compression yield, of course non-optimal compression would be achieved by this method. The point I was trying to make is that living organisms use redundant codes (such as the genetic code) in order not to lose information even after billions of replicating generations. Error rates in DNA copying vary from 10 to the -7th power to 10 to the -10th power (that's one thousand to one million times better than an average book proofreading outcome), and that's why organisms such as bacteria can copy their entire genetic information with an almost lossless transmission (never intended to compress anything). Anyway, from what you showed me, it's plain to foresee engineers will keep their jobs for a long while, because of their ability to tell the difference between 'signal' and 'noyse'.   

       The second point, if ever there was a chance for it, was hummm, well, how good would zip performance be on that extended redundant code? And the answer, 'not as bad as expected', because of redundancy itself.   

       Thank you all for discussing.
mayihave, Jul 27 2007
  

       1) The range of ASCII is 0-127 2) ASCII sucks anyway 3) Use RAID
ironfroggy, Jul 28 2007
  

       Well, I don't understand half of the ideas above, hummm. Ever since the 'message science' begun, people have been tryng to figure out what to do or how to encode things out. An ancient Chinese Emperor ruled there should be 80,000 ideographic-syllabic characters at the most, because Chinesse was almost impossible to learn already. What the... do people mean by comments such as //ASCII sucks anyway//? Of course it sucks, man! Unused characters suck more than frequently used characters. (I'll give you an excerpt of Harry Potter's character frequencies at the bottom of this comment.) Just bear in mind that there are only 58 different symbols in that book; and the top 30 most frequent characters make up >99% of the volume's 1,367,463 bytes. In fact, >51% of those bytes code for one of the top 6 characters in the ranking. Mini-micro-tiny-ridicule 'ascii-6' would 'explain' half the goddamn book!! Sounds like an outstanding statistics to begin with, doesn't it. I'm already half-baking a second-order Markov model so that frequent character pairs (such as 'th') could be 'condensed' into shorter codes.   

       Char. Freq. char. freq.   

       Blank 0,176 w 0,018 e 0,104 c 0,015 t 0,063 f 0,014 a 0,058 . 0,013 o 0,056 m 0,012 h 0,055 ‘ 0,012 n 0,051 b 0,011 i 0,050 p 0,010 s 0,050 , 0,010 r 0,049 v 0,010 d 0,039 k 0,008 l 0,028 H 0,008 g 0,026 I 0,003 u 0,024 T 0,002 y 0,019 K 0,002
mayihave, Jul 28 2007
  

       I am not sure i can follow your reasoning: In the idea you propose more than one code for often-used chars, and in the anno you propose less than two codes for two chars (th). So you would add a code for th, and then even make more than one code for th on account of its ubiquitousness?   

       And where would the gain be of zipping a redundantly coded text? - the redundancy would disappear.
loonquawl, May 27 2009
  

       //living organisms use redundant codes (such as the genetic code) in order not to lose information even after billions of replicating generations.//   

       That's not true in the way you meant it. I think you've got the wrong end of the stick.
The redundancy in 'the genetic code' - that is, the translation of DNA to protein sequence - is down to having 20 amino acids (plus 'stop') encoded within 64 codons[1]. The redundancy is used in a way which minimises the effect of some mutations, but if it were somehow not present, it wouldn't make any difference to the mutation rate. It doesn't end there, as there can be some information encoded within a set of codons for an amino acid[2], so supposedly silent mutations may still have an affect.
  

       There is redundancy which is used for error-correction, but that relies on there being two copies of the entire message: DNA has 2 strands which hold complementary information. If a base on one strand is damaged, or miss-encoded during copying, the issue can be repaired by reference to the other strand. Higher organisms are mostly diploid, having two DNA molecules encoding (virtually the same) message. Thus they really have four DNA strands of their genome.   

       Modern error-correcting codes are much more efficient than this, in any case. They don't need even two times as many bits to correct relevant types of error.   

         

       [1] 20-ish. There are some special cases we needn't concern ourselves with.
[2] Depending on the organism, one of the codons for an amino-acid may be favoured, especially in highly expressed genes. There is a higher concentration of the corresponding tRNA in the cell. This is essentially an optimisation of translation speed. Mutating away from that codon may be deleterious, just not as bad as a change to another amino-acid.
Loris, May 27 2009
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle