Halfbakery: Hyphenation As Subtraction

Culture: Language: Punctuation
Hyphenation As Subtraction (+3) [vote for, against]
Shorten hyphenated words by subtracting the second from the first

My father once came home from work and announced that the computer (presumably a '70s mainframe or mini) had broken down because it had attempted to subtract a double-barrelled surname.

Although I'm not sure this is true, this kind of thing can be done. Just as hexadecimal consists of the digits 0 to 9 along with A to F, base 36 could be 0 to 9 along with A to Z. This means that a word such as "bad" would have a value of 14629. Therefore a word such as "half-baked" would then be the numbers 20451 and 28869. Subtract the second from the first and you have the "word" "-6hu", which is shorter when written but not said. On the other hand, if the first word in a hyphenated term is longer, the result will be positive, an example being "mid-80s", whose result is "ehl", which can basically be pronounced.

Therefore, my proposal is quite simple: shorten texts by using lots of hyphenated words and saying the result of the second subtracted from the first. It will be hard to pronounce and often lengthen words in speech, but otherwise it's fine.
-- nineteenthly, Oct 14 2014

Yes, but what if "bad-ass" comes out the same as "half-baked"?

This is sort of like saying "add all the words together and express the answer as modulo 1", only not quite as much.

Incidentally, your father's story is not quite so far fetched. The IBM machines that were used in the Manhattan project were required to be modified such that, if a particular combination of numbers was fed to them, small gunpowder (or maybe it was guncotton) charges in the programming switches and card stacks would be ignited by little coils. This was to ensure that the machines could be decomissioned in an emergency if there was any risk of them being captured, leaving no clue as to the calculations they had been doing. Some of the later German Enigma machines had something similar.
-- MaxwellBuchanan, Oct 14 2014

I feel your account could be interpreted as a pun [MB], but I would like it to be true so I choose to believe that it's so. According to my trusty Jupiter Ace, bad-ass, or rather 36 BASE C! BAD ASS - . , comes out as "hl", so it's OK for that, though I take your point.
-- nineteenthly, Oct 14 2014

Also, maximum could be interpreted as (ma) multiplied by (imum). And similarly, either/or could be (either) divided by (or). And, pushing it a little, you could consider t to be similar to +, and so a word such as motor-car could be constructed as (mo)+(or)-(car).
-- pocmloc, Oct 14 2014

//According to my trusty Jupiter Ace//

Wowww. I have to say, I am impressed. You program it in Forth?
-- MaxwellBuchanan, Oct 14 2014

//-6hu...shorter when written but not said//

Not necessarily. I presume you are pronouncing 6hu as six-aytch-yoo. However, if you put on your fake Japanese accent and pronounce it 'sixhoo', all is resolved.
-- DrBob, Oct 15 2014

Why are you bothering with numerics ? just use the letters for base 26.

"hi there", being an addition, would be "thfcn".

"e-mail" would be "-maig".
-- FlyingToaster, Oct 15 2014

Weirdly, this overlaps something I've been thinking about for a while now. I am trying to connect similar entities from two huge (separate) tables. If I take the numeric columns and transform them into a normalised range of values, I can plot them, n-dimensionally (where n is the number of numeric columns) and identify which ones are closest to one another by calculating the hypotenuses, and identifying nearest neighbours. The problem is that key columns that need to be compared are non-numeric, but still have a concept of "closeness". My thought was to convert each of these strings into a vector, or series of vectors, each of which corresponds to the distance between each key on a qwerty style keyboard. Thus, the word "SAD" (all on the same Y-axis) might resolve to a vector of (x,y)->(1,0) (start at S, x-1 for A, x+2. While "GOOD" would be (x,y)->(G) (O:x+4,y+1)(O:0,0) (D:x-6,y-1)=(x,y)->(-2). If I also increment the z-axis for each character, the result is a wiggly spiral stretching off into the z- direction. Intuitively, I imagine these crystallisations or shapes ought to be relatively unique, but actually there are quite a few clashes, for words that are not that similar. It's been a while since I ran it. Anyway, the point is, the overlap with the idea here is the resolution of a word down to some kind of base, or integer value (since the wiggly word-shapes can be resolved to a single vector in 3-d space, the length of which might be a way of identifying that word and plotting it in a word-space that somehow put like- words together - IF - some method can be found that does that neatly without silly clashes - AND - and this is where I eventually gave up, allowed some kind of semantic dimension to give puns and similes a means to be identified)
-- zen_tom, Oct 15 2014

I do program it in FORTH, yes. It's fairly typical of me to back the loser but it was the first item I ever bought with money I earnt myself. It was just simpler than trying to work it out on the Windows calculator. Regarding the number base, I didn't think of that but I did think of using 1 and 0 as letter substitutes to reduce the size of the numbers and so also the calculations involved. [Zen_tom], I'll get back to you.
-- nineteenthly, Oct 16 2014

OK, [zen_tom], is that not a function rather than a value? I may have misunderstood, but I have the impression you're describing something which can be plotted as a series of connected diagonal line segments in three dimensional space. Have I got it right?
-- nineteenthly, Oct 16 2014

If we follow this usage of punctuation as a mathematical operator, some of the results would be enormous!
-- MaxwellBuchanan, Oct 16 2014

0. 1. There, I've said it all. Anything else is simply redundant.
-- RayfordSteele, Oct 16 2014

[nineteenthly] it depends on your definition - let's just say that I'd like to a stage where I can arrive at
i) a single point in multi-dimensional space that I can use to measure the distance between another point in multi- dimensional space to result in a scalar quantity that tells me how similar two words, or even entire strings of text are.

The twisty word-shapes are kind of cute to consider and work as handy visualisations, but it's that final numeric value that I'm ultimately aiming towards. (Though, it might be interesting to consider the shapes formed and see if they provide any geometrical methods of clustering the data formed - clockwise words, anti-clockwise, twisty ones, straight ones, words that form into loops, or which just wiggle along - there's bound to be a number of ways of categorising them other than resolving them to a single scalar magnitude)

I tried hamming and Levenstein distances, but the problem there is that each distance has to be against something else - to find a true measure, you have to calculate hamming/Levenstein distances from each string to each other string - the algorithm to do that is NP - If I can map any word into a space (and for me, yes that would be a function) without reference to any other word, then I can do that in much less time, and it will scale for large volumes. The hard part is defining that space such that like-words appear relatively close to one another. And I think the hard part in doing that is that there are so many different ways that two words can be alike. One way might be typing, so "awful" and "awfuk" are very close in that sense (assuming a particular keyboard layout, of course). And another might be "ear" and "shell-like", assuming some form of cockney dimension is incorporated into the model.
-- zen_tom, Oct 16 2014

I think you need something like a chaos plot. It works very nicely for DNA sequences (four letters), but I don't know if there's any reason it wouldn't work for a larger alphabet.
-- MaxwellBuchanan, Oct 16 2014

Yes, I've heard of similar uses/approaches used for DNA sequencing - not heard of a chaos plot till now - that will be something to look up tomorrow.

The other possibly interesting thing about mapping words onto numeric spaces, is that it's possible to calculate things like averages, outliers etc. Even more interesting would be if the mapping function worked both ways - i.e. if a word goes in and translates to an n-dimensional set of coordinates {a,b,c,d,e,f,...n} then what do you get if you turn it around, feed in a set of coordinates? Entirely new and alien words could be discovered this way.
-- zen_tom, Oct 16 2014

//0. 1. There, I've said it all. Anything else is simply redundant.

Hmm, by removing the zeroes in this fashion, with 1=0, 11=1 and so on, you could save 50% of the bandwidth.

I have this nasty feeling that might just work with enough checksums, and a deity with a strange sense of humour.
-- not_morrison_rm, Oct 16 2014

I am afraid zen-toms scheme glimmers with manic madness to me and I had to stop reading it.

But I like nineteethly's idea fine. A problem I have is that is hard to go back: there is a loss of value in subtraction and multiple possible word partners could have the same result.

I propose instead that the entire word without hyphen be converted to a number, and that number reduced to its cube root. Big languages like German could use the 4th or 5th root. Halfbakery or 2045128869 becomes 1269.33 or abfi.cc. The decimal point is pronounced as a cough. If you want the original back, easy - just cube it and there you go. Of course what comes back has no hyphen but they are pretentious anyway.
-- bungston, Oct 16 2014

// no hyphen but they are pretentious anyway.

I would quibble with that but I have to -
-- not_morrison_rm, Oct 16 2014

//glimmers with manic madness// actually [bungston] that's actually a very helpful point - in meetings when walking through a possible course of action, especially when it's a particularly interesting one, there has been a tendency for the audience to gloss over - I suspect it may be at the point where mania is detected, and I may need to consider ways of shielding my colleagues from the more manic stuff, as it rarely sticks - until I build it myself and they see it working on a "don't need to know how that works under-the-covers" basis.
-- zen_tom, Oct 17 2014

//I build it myself and they see it working//

Absolutely, [zen_tom]. Skunkworks. Ask forgiveness, not permission. Or go to work with more intelligent people, if you can find them.
-- pertinax, Oct 18 2014

//This means that a word such as "bad" would have a value of 14629. Therefore a word such as "half-baked" would then be the numbers 20451 and 28869.//

How do you figure? I get 806883 - 18968773, or "-at9sy". Negative at ninesy.

BTW, the Ruby programming language can do math like this very easily. When you convert a string to an integer or vice versa, you can specify a base between 2 and 36. So the statement ( "half".to_i(36) - "baked".to_i(36) ).to_s(36) will perform this half-baked arithmetic.
-- ytk, Oct 19 2014

FORTH does it that way too. It has an 8-bit variable called BASE. I may be wrong. Basically, I didn't go to the extra effort of writing a double- length integer printing word, which would've been easy.
-- nineteenthly, Oct 20 2014

FORTH brings back memories - I never got the hang of FORTH but on the other hand, I loved programming in LISP. Different people's minds work in different ways, I guess. This was all around the time I was being employed as a part-time Pascal/VMS programmer (it seems weird now to think that people actually paid me to do programming - I am not a very good programmer...).
-- hippo, Oct 20 2014

While we're on this programming-language tangent, I've been playing with Python recently - and love it - one favorite thing that it has is a full and deep set- processing facility built in. So you can define a set
a = {1,2,3,4,5}
and another one
b = {3,4,6}
And then ask
c = a.intersection(b)
d = a.union(b)
e = a.difference(b)
And expect the results
c = {3,4}
d = {1,2,3,4,5,6}
e = {1,2,5}
And so on, only if you like, you can create sets of characters, or documents, or objects, whatever you like. It's a nice facility I've not seen embedded within a programming language before, which saves lots of time writing/porting/managing different iterative/search routines. It's a nice way to formulate and resolve problems sometimes - especially when you need to explain your code to a pointy-haired person who responds only to basic Venn-diagrams.
-- zen_tom, Oct 20 2014

That's pretty common for modern programming languages, especially (so-called) scripting languages like Python and Ruby. Ruby's version is even terser:

a & b — intersection
a | b — union
a - b — difference

Also, realize that what you think of as being “embedded” within a programming language really means it's just part of the standard library. You can very easily add that functionality to most programming languages by either creating a custom array object or modifying the built-in array. You could also technically remove it from the language, but there's really no reason to do so.
-- ytk, Oct 20 2014

Hmm, it's interesting - these days, there's a library for anything, and if you don't like how those folks did it, you can fork, or write one yourself. I've done the same thing myself. What's nice about having something like this deep in the "core" packages is that other libraries are informed and are shaped by it's existence.
-- zen_tom, Oct 20 2014

!=factorial
-- MaxwellBuchanan, Oct 21 2014

what is not equal to factorial?
-- hippo, Oct 21 2014

<>!
-- MaxwellBuchanan, Oct 21 2014

random, halfbakery