Great work there from Mike -- thanks. That's enough to convince me that length is definitely preferable to end-position. I maintain my preference for zero-based indexing, but I don't feel strongly about it. As I said before (but the immediately contradicted by including poorly edited old text :-) I'm quite happy to simplify by saying the negative numbers are Just Pain Not Allowed. _/|_ ___________________________________________________________________ /o ) \/ Mike Taylor <[log in to unmask]> http://www.miketaylor.org.uk )_v__/\ "Writing is like driving a car at night. You can only see as far as the headlights, but you make the whole trip that way" -- E. L. Doctorow. Mike Rylander writes: > On 8/17/06, Mike Taylor <[log in to unmask]> wrote: > > Ray Denenberg writes: > > > I'm trying to track down why we changed it from length to position. > > > The discussion was among the Ed. Board, and as I recall there was a > > > good reason but I haven't tracked it down yet (in the Ed. Board > > > archive). If I can't find it we'll go with position. (Thoug that > > > does destabalize the proposal.) The zero-base issue is more > > > difficult, as it seems there is fairly overwhelming support for > > > 1-based as far as I can see. > > > > I only agreed to 1-based because someone on list list asserted that > > it's what most substring operations use. But if Mike Rylander would > > like to repeat his helpful survey of other environments' substring > > operations, I'd support whichever index-base is most widely used > > elsewhere. > > Now look what I've gotten myself into ... ;) > > I'll go through the environments I use most, in no particular order whatsoever. > > There is some amount of discordance between different SQL > implementations regarding the index base of the start position. > > Postgresql uses 1-based indexes in order to provide consistency with > ordinal tuple position[1]. It does not allow negative length, but > using negative (or zero) start position will place the start logically > before the beginning of the string, thusly: > > poop=# select substr('123456',-2,5); > substr > -------- > 12 > (1 row) > > While that does have a brand of internal consistency, I wouldn't > promote it here. > > Oracle[2], while being mostly 1-based also, allows a start position of > zero which means exactly the same thing as 1. I think this is worse > than Pg's plan, since there's less consistency. > > Just to add more inconsistency, MySQL[3] has decided to allow negative > positions and uses that to mean "from the end of the string", however, > the second parameter is still always length. They use 1-based start > positions, and don't say anything about zero (and I don't use MySQL), > so I can't comment on that piece. > > So that covers SQL implementations, but it doesn't really give much > direction since there is so much disagreement over details. Arguably, > Pg is the closest to the SQL spec, but the spec does not constrain > extensions in most cases, so it's hard to point to anything solid, > other than nominally 1-based with length. > > On to programming languages! > > We've seen Perl, PHP, C/C++ (0-based, length) and Python (no substr, > just slice) ... another currently popular language, Ruby[4], uses > 0-based start position and length for substr, and /also/ uses 0-based > start position for slices when the '[]' method is handed a range ( > x..y ) instead of two numbers. Java[5], trying to be a rebel, uses > the second param in its .substring() method for end-position (that's a > /slice/, Sun!), but it is also 0-based. XPath[6] uses 1-based start > position and length. > > So here are my ill considered, biased conclusions: > > Traditionally, string indexing is 0-based. This comes from looking at > strings as arrays of characters, and for the purpose of extraction > (with the goal of matching), this feels "right" to me. Those that > buck real-programming convention and use 1-based position are doing so > for local consistency (SQL -- 1-based ordinal position) or just > because (Java -- any idea?) or because they are following one (or > both) of the 1-based crowd (XPath -- following Java, and attempting to > be a query language like SQL). > > Being a programmer, I'm more comfortable with 0-based string indexing, > so that's what my heart would prefer. However, we are designing a > query language here, so it makes sense to follow the convention of > other query languages. Both SQL and XPath/XSLT have 1-based substring > implementations, so I'm not going to put up too much of a fight to get > 0-base substrings. I would, however, urge everyone to discard the > idea of "logical" starting positions -- no zero or negatives, please! > There's just too much confusion over what they should mean, and there > are other methods that can be employed for providing "match the end of > the string" stuff. > > As to lenghth vs end-position, every single implementation of substr > uses length, with the exception of slice methods that happen to be > spelled "substr". IMHO, if an end-position version of this method is > desired/desirable, it should be called "slice". Note, however, that > you can implement slice via substr by saying > > slice( string, substr-start, substr-start + substr-len - 1 ) > > and substr via slice with > > substr( string, slice-start, slice-end - slice-start + 1 ) > > So, if that's how you want to do it under the covers, or your language > of choice doesn't have a proper substr implementation, you can > transform to slice in the backend. > > > 1) http://www.postgresql.org/docs/8.1/interactive/functions-string.html > 2) http://www.adp-gmbh.ch/ora/sql/substr.html > 3) http://mysql.com/doc/refman/5.0/en/string-functions.html > 4) http://www.rubycentral.com/book/ref_c_string.html#String._ob_cb > 5) http://mindprod.com/jgloss/substring.html > 6) http://www.topxml.com/xsl/funcsubstring.asp > > > > > > (And for the same reason I prefer length to end.) > > > > _/|_ ___________________________________________________________________ > > /o ) \/ Mike Taylor <[log in to unmask]> http://www.miketaylor.org.uk > > )_v__/\ "He looks around; he sees angels in the architecture, spinning > > in infinity" -- Paul Simon, "Call Me Al" > > > > > -- > Mike Rylander > [log in to unmask] > GPLS -- PINES Development > Database Developer > http://open-ils.org