Great work there from Mike -- thanks. That's enough to convince me
that length is definitely preferable to end-position. I maintain my
preference for zero-based indexing, but I don't feel strongly about
it. As I said before (but the immediately contradicted by including
poorly edited old text :-) I'm quite happy to simplify by saying the
negative numbers are Just Pain Not Allowed.
_/|_ ___________________________________________________________________
/o ) \/ Mike Taylor <[log in to unmask]> http://www.miketaylor.org.uk
)_v__/\ "Writing is like driving a car at night. You can only see as
far as the headlights, but you make the whole trip that way" --
E. L. Doctorow.
Mike Rylander writes:
> On 8/17/06, Mike Taylor <[log in to unmask]> wrote:
> > Ray Denenberg writes:
> > > I'm trying to track down why we changed it from length to position.
> > > The discussion was among the Ed. Board, and as I recall there was a
> > > good reason but I haven't tracked it down yet (in the Ed. Board
> > > archive). If I can't find it we'll go with position. (Thoug that
> > > does destabalize the proposal.) The zero-base issue is more
> > > difficult, as it seems there is fairly overwhelming support for
> > > 1-based as far as I can see.
> >
> > I only agreed to 1-based because someone on list list asserted that
> > it's what most substring operations use. But if Mike Rylander would
> > like to repeat his helpful survey of other environments' substring
> > operations, I'd support whichever index-base is most widely used
> > elsewhere.
>
> Now look what I've gotten myself into ... ;)
>
> I'll go through the environments I use most, in no particular order whatsoever.
>
> There is some amount of discordance between different SQL
> implementations regarding the index base of the start position.
>
> Postgresql uses 1-based indexes in order to provide consistency with
> ordinal tuple position[1]. It does not allow negative length, but
> using negative (or zero) start position will place the start logically
> before the beginning of the string, thusly:
>
> poop=# select substr('123456',-2,5);
> substr
> --------
> 12
> (1 row)
>
> While that does have a brand of internal consistency, I wouldn't
> promote it here.
>
> Oracle[2], while being mostly 1-based also, allows a start position of
> zero which means exactly the same thing as 1. I think this is worse
> than Pg's plan, since there's less consistency.
>
> Just to add more inconsistency, MySQL[3] has decided to allow negative
> positions and uses that to mean "from the end of the string", however,
> the second parameter is still always length. They use 1-based start
> positions, and don't say anything about zero (and I don't use MySQL),
> so I can't comment on that piece.
>
> So that covers SQL implementations, but it doesn't really give much
> direction since there is so much disagreement over details. Arguably,
> Pg is the closest to the SQL spec, but the spec does not constrain
> extensions in most cases, so it's hard to point to anything solid,
> other than nominally 1-based with length.
>
> On to programming languages!
>
> We've seen Perl, PHP, C/C++ (0-based, length) and Python (no substr,
> just slice) ... another currently popular language, Ruby[4], uses
> 0-based start position and length for substr, and /also/ uses 0-based
> start position for slices when the '[]' method is handed a range (
> x..y ) instead of two numbers. Java[5], trying to be a rebel, uses
> the second param in its .substring() method for end-position (that's a
> /slice/, Sun!), but it is also 0-based. XPath[6] uses 1-based start
> position and length.
>
> So here are my ill considered, biased conclusions:
>
> Traditionally, string indexing is 0-based. This comes from looking at
> strings as arrays of characters, and for the purpose of extraction
> (with the goal of matching), this feels "right" to me. Those that
> buck real-programming convention and use 1-based position are doing so
> for local consistency (SQL -- 1-based ordinal position) or just
> because (Java -- any idea?) or because they are following one (or
> both) of the 1-based crowd (XPath -- following Java, and attempting to
> be a query language like SQL).
>
> Being a programmer, I'm more comfortable with 0-based string indexing,
> so that's what my heart would prefer. However, we are designing a
> query language here, so it makes sense to follow the convention of
> other query languages. Both SQL and XPath/XSLT have 1-based substring
> implementations, so I'm not going to put up too much of a fight to get
> 0-base substrings. I would, however, urge everyone to discard the
> idea of "logical" starting positions -- no zero or negatives, please!
> There's just too much confusion over what they should mean, and there
> are other methods that can be employed for providing "match the end of
> the string" stuff.
>
> As to lenghth vs end-position, every single implementation of substr
> uses length, with the exception of slice methods that happen to be
> spelled "substr". IMHO, if an end-position version of this method is
> desired/desirable, it should be called "slice". Note, however, that
> you can implement slice via substr by saying
>
> slice( string, substr-start, substr-start + substr-len - 1 )
>
> and substr via slice with
>
> substr( string, slice-start, slice-end - slice-start + 1 )
>
> So, if that's how you want to do it under the covers, or your language
> of choice doesn't have a proper substr implementation, you can
> transform to slice in the backend.
>
>
> 1) http://www.postgresql.org/docs/8.1/interactive/functions-string.html
> 2) http://www.adp-gmbh.ch/ora/sql/substr.html
> 3) http://mysql.com/doc/refman/5.0/en/string-functions.html
> 4) http://www.rubycentral.com/book/ref_c_string.html#String._ob_cb
> 5) http://mindprod.com/jgloss/substring.html
> 6) http://www.topxml.com/xsl/funcsubstring.asp
>
>
> >
> > (And for the same reason I prefer length to end.)
> >
> > _/|_ ___________________________________________________________________
> > /o ) \/ Mike Taylor <[log in to unmask]> http://www.miketaylor.org.uk
> > )_v__/\ "He looks around; he sees angels in the architecture, spinning
> > in infinity" -- Paul Simon, "Call Me Al"
> >
>
>
> --
> Mike Rylander
> [log in to unmask]
> GPLS -- PINES Development
> Database Developer
> http://open-ils.org
|