h a l f b a k e r yBunned. James Bunned.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
It took mankind quite a while to come up with the zero, but it's been a huge success ever since. I propose a new digit, meant as unknown. Unknown is not the same as zero, and the number 198,000 is currently indistinguishable from the estimate 198,000 without specifically adding an uncertainty factor
(done in some scientific realms, but unwieldy and mostly ignored elsewhere). The new digit would let you know at a glance the number of significant figures. 198,XXX tells you that you just don't know (or maybe care) about digits beyond the 8. Of course, we could keep the rigourous and precise uncertainty for scientific persuits, but this would be useful anywhere.
I came upon this idea when forced to choose a date for a picture from the past when I scanned it into my computer. Despite only knowing that it was in June of '98, I had to type in 6/1/1998. Some future ancestor will think this was a precise date, and perhaps base assumptions on that bad information. Had I the option, 6/X/1998 would have been perferred. In other cases, X/X/1998 or even X/X/199X would have been appropriate.
The problem with new 'sensible' standards.
http://xkcd.com/927/ [AusCan531, Jul 03 2012]
http://www.cambridge.org/9780521424639
[pocmloc, Jul 03 2012]
[link]
|
|
There'd need to be some convention as to how "X"
behaved in sorting, ranking etc - maybe it could be
assigned a value of 0. Actually maybe not, since
that would cause problems with the "0th of July", or
"month 0". Maybe have it count as a 1, since 1 can
exist in any position in a date. |
|
|
Yes, but zero only works after the decimal. How do
you distinguish 150 from one-hundred-and-fifty-
something? |
|
|
When the height of Everest was first measured
accurately, the poor surveyor found it to be 29,000ft
exactly. Fearing that people would assume he had
just estimated it to the nearest thousand feet, he
reported it as 29,002ft. |
|
|
<random blather>
As [bs] mentioned, "?" is already in use.
A squiggle can be used to denote approximation ~186,000.
Database designers can subdivide a field into placeholders. So where does the "?" or "X" go in the sort order ?</rb> |
|
|
There's a already a very useful concept of nullity in databases. This is an elegant refinement of it. [+] |
|
|
The tomb of the unknown digit, being the smallest site of remembrance within Her Majesty's Commonwealth. |
|
|
[FT] This is not a let's all. Yes, for it to be useful it would have to be generally adopted, but the idea is solid without the general adoption. |
|
|
As far as sorting, it should generally sort as if it had zeros in all x columns, as this would place near the nominal center of uncertainty. |
|
|
//sort as if it had zeros in all x columns// |
|
|
But then how do "12340" and "1234X" sort, relative to
eachother? |
|
|
In an Oracle environment, 1234X would come last, in a Microsoft environment, it would come first. Come on, you knew that really, didn't you? |
|
|
I can digit. If not sigit or figit. |
|
|
//Despite only knowing that it was in June of '98, I
had to type in 6/1/1998// |
|
|
I think I see your problem... you've tagged it as
the 6th day of January, 1998. |
|
|
You'd have been better off marking it 19980601, to
give you 1 June 1998, with the added benefit of
being able to sort all properly appended dates
chronologically. |
|
|
But this is extraordinarily close to proper usage of significant figures. |
|
|
Your 198,000 example (being either 198000 +/- 1000 OR 198500 +/- 500 - you didn't say) is better expressed as 1.98E5. Eliminate the unknown information altogether. |
|
|
As a former North American who used to write dates as Month/Day/Year but now conforms to the more sensible Day/Month/Year format I agree with [Unabubba] that the Year/Month/Day format is the most sensible of all as it is sortable by chronological order. XKCD [link] does point out the problem with adopting the most sensible solution. |
|
|
As for the //how do "12340" and "1234X" sort, relative to eachother?// question, "12340" should sort first as "X" is either equal to zero or greater. (with a 90% chance of being greater). |
|
|
As for the original idea, I wish I could give a 2nd bun. |
|
|
[Aus] Not true, though I made the same mistake
initially. X is actually equal to 0 +/-5, it's
uncertainty, not unknown. |
|
|
//0 ±5// ackshully X (though I still think ? is more accurate) should slot in as 0.49999999... . |
|
|
Yes, you're quite correct [MechE] |
|
|
//As a former North American who used to write dates as
Month/Day/Year but now conforms to the more sensible
Day/Month/Year format I agree with [Unabubba] that the
Year/Month/Day format is the most sensible of all as it is
sortable by chronological order.// |
|
|
Actually, the D-M-Y format is the least sensible of all. The
One True Date Format is indeed Y-M-D, which means that M-
D-Y conforms more closely to the way dates should be
written in order to sort properly. The only thing D-M-Y has
going for it is internal consistency, but it is far better to be
more correct than more consistent in this case. At least with
M-D-Y, dates within a given year will sort correctly. |
|
|
Incidentally, when I switched to generally writing dates in Y-
M-D format, I eventually discovered an unexpected benefit:
In the early part of the year, I noticed that I never
absentmindedly wrote down the previous year as long as I was
writing the year first. |
|
|
//ackshully X (though I still think ? is more accurate) should
slot in as 0.49999999...// |
|
|
0.49999999
is exactly equal to 0.5 . |
|
|
Can you write *infinity* in Roman Numerals?
They don't have a zero, which makes it very confusing...but I do like this idea. [+] |
|
|
What [Custardguts] said - existing scientific notation covers this quite nicely, and, since it was formalised in IEEE 754, it has been the technical standard for all floating point arithmetic since 1985. |
|
|
Consisting of 3 parts, a sign, coefficient, and exponent - you can describe any number you wish, to any level of precision you like. e.g. +123XX would be listed as +1.23 x 10^5 providing both a workable value and an indication as to the known degree of accuracy. |
|
|
It would be neat to apply the same concept to dates as well - although that is already sort of done to some extent in dates vs datetimestamps if only by convention. |
|
|
// It could even vary depending on what program
you're running.// |
|
|
Which would be very bad. The beauty of the idea
here is that there would be a standard character
which can stand in for any digit, but which would
have the same breadth of validity as the digits 0-9. |
|
|
The closest to a widely-accepted convention would
be "?", or perhaps "*". |
|
|
Per your example with a date format, I have seen and used systems that allowed for unknown date data by using zero in place of the unknown element. In this case you could enter 6/0/1998 and the system, recognizing the zero to indicate the unknown day, would display "June 1998" as the date. These were specialized systems however and I don't know how widespread this practice is. |
|
|
Outside of dates, I have seen "?" used frequently for unknown digits. |
|
|
[bigsleep], shouldnt the percentage error remain the same if you multiply by a known number? .i.
4x(198000±500)=792000±2000
<link> |
|
|
//date [...] specialized systems// |
|
|
Yep, I've used genealogy software that can handle dates with "about", "before", "after", "between", and so forth. Been trying to chase down what the internal data format was; no luck so far. Suspecting stored-as-text, and processed with a lotta extremely ugly code. |
|
|
If you can write a consistent set of rules for sorting
fuzzy dates, the actual programming is fairly
simple. All you have to do is write a comparison
method for your fuzzy date object that will
determine whether it's less than, equal to, or greater
than another fuzzy date object. You can then sort
them like any other date. |
|
|
Is it too late for a 'whose finger is that?' joke? |
|
|
Well... er, that _was_ the joke, actually. |
|
|
I liked the 'Array of negative size' post earlier last
month, but I don't really understand what wrong with
the good ole 'X' in 198,XXX in this case. |
|
|
//what wrong with the good ole 'X' in 198,XXX// |
|
|
That indicates an unknown variable, not uncertainty. That is, you would expect the result to be in the range 198,000 to 198,999. 198,?00, on the other hand, would provide a range of 197,500 to 198,499. |
|
|
//198,?00, on the other hand, would provide a range of 197,500 to 198,499.// Would it? How so? Would it not instead indicate that the answer was one of the following options:
198,000
198,100
198,200
198,300
198,400
198,500
198,600
198,700
198,800
198,900
and no other possibility? |
|
|
No, uncertainty is different than a variable. |
|
|
If I measure that a car is clocked at 90 miles per hour, with a device that has an uncertainty of +/- 2, it could be going from 88-92 mph. In standard engineering practice, your last digit is always uncertain. For decimal numbers (including scientific notation) the uncertainty is specified by the last written digit. In this case 1.980x10^5. What this idea allows is the ability to write non-decimal numbers (with trailing zeros) the same way. That is 198,?00 has an uncertainty of +/-500, but 198,0?0 has an uncertainty of +/-50 (1.9800x1-^5). |
|
|
In writing that, however, it occurs to me that the uncertainty should append to the preceeding digit. This means that 1.98x10^5 would be written as 198,?00, and have an possible range from 193,000 to 203,000. This could further be expanded by allowing the size of the uncertainty (up to +/-9) to be written after the uncertainty digit. This would allow for 198,?20 to cover a range of 196,000 to 200,000, for instance. For my car example it would be 90.?2. |
|
|
(European readers, please excuse the American decimal points and comma separators.) |
|
|
I think we've forked a bit: there's a distinction between |
|
|
"my driver's license number is one two three four five something seven eight nine" : 12345X789 |
|
|
and "approximately 18,000": 18,XXX |
|
|
Of course there is no variable, and it's not an equation. It's a number in scientific notation. But in an engineering measurement, there is uncertainty. If I measure something that is 1.98 meters long, with a device that is accurate to 2 centimeters, I don't know where it is between 1.96 to 2.00m. If I measure it with something that is accurate to 2 millimeters, then I know it is between 1.978 and 1.982 meters. That is the signficant figures application mentioned in the original post. And by standard practice, in scientific notation, the uncertainty is in the last written variable, exactly as I showed it in the post before that one. |
|
|
I'll admit that the date application mentioned calls for a variable, not an uncertainty, but yes, they are two different things. |
|
| |