INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Change Request: IsRange()  (Read 3143 times)

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Change Request: IsRange()
« on: January 28, 2013, 01:41:20 pm »

I posted a request here for posterity:

    http://yabb.jriver.com/interact/index.php?topic=62258.0

which we can discuss in this thread.

I tripped over the issue in the post above again for the nth time.  Its a stinker and the current implementation is not so useful.

I'd like to propose a change to IsRange() to Do the Right Thing.

   IsRange(arg, n-m,...)

   - Support ASCII collation
   - Never convert arg
   - If arg is more than a single non-digit, return FALSE
   - If arg is one or more digits, and n and m are both numeric, return n <= argument <= m
   - If arg is a single character, and n and m are both single characters, use ASCII collation as comparison
   - Support multiple single-character ranges: 0-9,a-z,A-Z (letters,digits) or !-/,:-@,[-`,{-~  (punctuation)
   - Escape dash with / in ASCII collation mode
   - Support negative ranges: -10-/-100
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #1 on: January 28, 2013, 07:17:17 pm »

I'm subscribed.

Could you help me understand what you mean by 'ASCII collation'?

Could you give an example that returns unexpected results with the current implementation?

Thanks.
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #2 on: January 28, 2013, 08:50:44 pm »

Great.

The basic problem: isrange(B,0-9) returns True.

Usage, testing first letter of an artist:

   if(isrange(left([Artist],1), 0-9), Digit, Non-Digit)

always returns Digit.

ASCII collation: Convert the single character into its ASCII integer value and compare it numerically against the (also converted) m and n range values.  So:

! (0x21) to / (0x2f) are an ASCII collating sequence, which are often range-specified as !-/.

  ! (0x21) <= * (2a)  <=  / (0x2f)

Any two ASCII characters represent an ASCII range, which includes all the ASCII characters m through n, inclusive.

! (0x21) through ~ (0x7e) specified as !-~ includes all ASCII visible characters.

What's nice about this, it doesn't break existing: 0-9 or a-z or A-Z functionality (since ASCII comparison is relevant only for single character range-specifiers and characters); it generalizes and extends it.  Nor does it break 1-100 specifiers since n is not a single character (and hence not subject to ASCII testing).
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #3 on: January 29, 2013, 02:55:14 pm »

I think the IsRange(A, 0-9) is sort of a bug.  Or just something we never thought of.

So next build:
Changed: The IsRange(...) expression function will not consider a string value equivalent to zero when evaluating a numeric range (so IsRange(A, 0-9) will now return 0 instead of 1).

We can make more changes to "Do the Right Thing", but it might be easier to work in the other direction.  You say things that are unexpected or impossible and we'll see if we can get them going.

Thanks.
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #4 on: January 29, 2013, 03:30:47 pm »

Some difficulties today.

- It is very hard to detect non-printables or any range of characters not alpha-numerics (for example, in [Description]).  The only way currently is to use Regex(), or long chains of isequal().  But this becomes a problem if Regex() has already been used, since subsequent calls to Regex() destroy [R#] values.

- It is very hard to use output from Regex() again in another Regex(), so I use obtuse methods to work around this.  I use Regex() routinely to grab pieces of a string, store them in [R#] values for splicing in various ways.  But often I want to test the content of an [R#] value, for say, any punctuation, or non-printable, or diacritic characters.  This can't be easily done currently.  Hence the IsRange() request for a broader meaning of a range.

Side topic: I think we discussed once how [R#] values are clobbered in subsequent calls.  Here's an example:

Description: What is the meaning\nof life?

regex([Description], /#(\w+) (\w+) (\w+)#/, -1)/
regex([R1], /#([aeiou])#/, -1)[R1] - [R2] - [R3]/
regex([R2], /#([^aeiou])#/, -1)[R1] + [R2] + [R3]

The [R#] values from the first Regex() are clobbered by the second Regex().  While we can test [R1] in the second Regex(), you can see how cascading Regex() calls doesn't work as we'd like.  My work around is to use global variables, but that becomes overwhelming ugly.  This makes, as per the above topic, using Regex() as a match testing and capturing tool far less useful, so I'm relying on other functions to support the goal.  Note: the problem would be generally solved w/support for general variables in the expression language - this would allow assigning the captures for later reuse.

regex([Description], /#(\w+) (\w+) (\w+)#/, -1)/{a}:=[R1]{b}:=[R2]{c}:=[R3]
regex({a}, /#([aeiou])#/, -1)...
regex({b}, /#([^aeiou])#/, -1)...

Sweating that would be named captures:

regex([Description], /#(\w+) (\w+) (\w+)#/, {a}, {b}, {c})
regex({a}, /#([aeiou])#/, {d})...
regex({b}, /#([^aeiou])#/, {e})...
Logged
The opinions I express represent my own folly.

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #5 on: January 29, 2013, 06:22:44 pm »

I just wanted to add, thanks for the changes to this and everything else.

My feedback is meant as suggestive, and I have no (unrealistic) exceptions of more than you offer. :-)

Thanks again folks.
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #6 on: January 29, 2013, 06:47:02 pm »

This simple expression works:
Regex(Abba, /(.+/), -1)Regex([R1], /(.+/), -1)[R1]

Capture Abba to R1, capture R1 back to R1, then output R1.  It shows Abba.

Doesn't that mean you can use a capture in the next RegEx(...) call?  Or am I missing something?
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #7 on: January 29, 2013, 06:54:25 pm »

That works fine.  Its when you capture two or more items in the first regex().  You can use any of the captures as an argument to the second regex() call, but that call will clobber the other (as of yet unused) [R#] values.

But this is normal, so I wasn't expecting that to change (same thing in perl, btw.)

echo "What is the meaning\noflife" |
   perl -ne '
      /(\w+) (\w+) (\w+)/ and print "$1 - $2 - $3\n";
     $2 =~ /([aeiou])/ and print "$1 + $2 + $3\n";
    '
What - is - the
i +  +
Logged
The opinions I express represent my own folly.

DoubtingThomas

  • Citizen of the Universe
  • *****
  • Posts: 564
Re: Change Request: IsRange()
« Reply #8 on: January 29, 2013, 07:28:28 pm »

Abba ?? ha ha ha...
Logged

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #9 on: January 29, 2013, 07:31:29 pm »

Like he said:  Ab{2}a [?]{2} (ha ){3}\.\.
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #10 on: January 29, 2013, 08:20:22 pm »

Note: the problem would be generally solved w/support for general variables in the expression language - this would allow assigning the captures for later reuse.

regex([Description], /#(\w+) (\w+) (\w+)#/, -1)/{a}:=[R1]{b}:=[R2]{c}:=[R3]
regex({a}, /#([aeiou])#/, -1)...
regex({b}, /#([^aeiou])#/, -1)...

Couldn't you use Save(...), like:
regex([Description], /#(\w+) (\w+) (\w+)#/, -1)/Save([R1], a)Save([R2], b)Save([R3], c)
regex([ a ], /#([aeiou])#/, -1)...
regex([ b ], /#([^aeiou])#/, -1)...
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #11 on: January 29, 2013, 08:25:45 pm »

That's exactly what I'm doing now.
Logged
The opinions I express represent my own folly.

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #12 on: January 29, 2013, 08:32:16 pm »

Here's one I just posted...

f(regex([Artist], /#([^[:punct:]\s])[[:punct:]\s]*?([^[:punct:]\s])#/),
   ifelse(
      isrange([R2], a-d),  [R1]a-[R1]d,
      isrange([R2], e-h),  [R1]e-[R1]h,
      isrange([R2], i-l),     [R1]i-[R1]l,
      isrange([R2], m-p), [R1]m-[R1]p,
      isrange([R2], q-t),   [R1]q-[R1]t,
      isrange([R2], u-z),   [R1]u-[R1]z,
      1, [R1]*
   ),
  *
)\replace([Artist], ;, /,)&datatype=[list]


It gets pretty ugly when you have to keep load()ing:

f(regex([Artist], /#([^[:punct:]\s])[[:punct:]\s]*?([^[:punct:]\s])#/),
   save([R1], rone)save([R2], rtwo)/
   ifelse(
      isrange(load(rtwo), a-d),  load(rone)a-load(rone)d,
      isrange(load(rtwo), e-h),  load(rone)e-load(rone)h,
      isrange(load(rtwo), i-l),     load(rone)i-load(rone)l,
      isrange(load(rtwo), m-p), load(rone)m-load(rone)p,
      isrange(load(rtwo), q-t),   load(rone)q-load(rone)t,
      isrange(load(rtwo), u-z),   load(rone)u-load(rone)z,
      1, load(rone)*
   ),
  *
)\replace([Artist], ;, /,)&datatype=[list]
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #13 on: January 29, 2013, 08:33:00 pm »

That's exactly what I'm doing now.

So are you just looking for cleaner syntax?

It's easiest to add functions (as opposed to new syntax).  Is there a function that could help?

We might also be able to roll the [R1], etc. variables a few deep, like this:
[R1] rolls to [RR1] to [RRR1] (or some other naming we can agree on)

I have to look at the code for that idea one to see if it can be done without impacting performance.  I'm not sure how capture variables are stored off the top of my head, and it would only work if we could do it cleanly without requiring a memory copy.
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #14 on: January 29, 2013, 08:43:59 pm »

Understood re: new functions vs. changed syntax (scarey lexer/parser changes).  And also appreciate the dialog.

I have an idea, but don't know if your syntax can handle it.

Can you create scoped in-the-fly variables, which could essentially be shorthand for load(var)?  After a call to save(var) in the current scope of the expression, [var] is now a reference to essentially load(var).  This would possibly be one of those scarey changes.

But then the syntax becomes much easier to deal with.

As far as the deep variables for saving Regex(), this seems like a problem in the waiting.  Delete one regex() expression from a chain and suddenly you have to change all variable names.
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #15 on: January 29, 2013, 08:49:22 pm »

Can you create scoped in-the-fly variables, which could essentially be shorthand for load(var)?  After a call to save(var) in the current scope of the expression, [var] is now a reference to essentially load(var).  This would possibly be one of those scarey changes.

The reason that Load(...) is required is that it accesses a global variable store and requires a thread lock which has a performance impact.

Regular Field(...) calls (which brackets are shorthand for) do not require this.  Since they're the most common function, speed is critical.

However, it might be safe to go to the global variable store any time the Field(...) function doesn't resolve to a field or user variable.  This gives you the functionality of [var] without a performance hit.

Also, a little thing, but in your example, you want Load(rone) not Load([rone]) ( which expands to Load(Field(rone)) and requires checking the user variable space for rone, finding no match, and outputting the literal "rone" ).
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #16 on: January 29, 2013, 08:56:05 pm »

I recall you mentioning that [ ] is already shorthand for Field().

If you're able to fallback to the global store on unresolved Field() values, that could be great.  As an alternative suggestion should that not work, maybe a new bracketing syntax would work for globals.

The load() issue in the example was, in glynor's words, a copy-pasta error, al-dente for sure (I'm hopeless without my RE global replacements in an editor).  Fixed.
Logged
The opinions I express represent my own folly.

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42372
  • Shoes gone again!
Re: Change Request: IsRange()
« Reply #17 on: January 30, 2013, 01:07:42 pm »

If you're able to fallback to the global store on unresolved Field() values, that could be great.

This was easy and has no real performance impact, so next build:
Changed: Global expression variables stored with Save(...) can be accessed by using brackets (ie. "Save(It works!, v)[v]" will output "It works!").

A little more complex example like:
Save(foo, v1)Save(bar, v2)Save(baz, Artist)[v1][v2][Artist]

Will output:
foobarAbba

In other words, you can't get to a global variable that uses a reserved name by using brackets.

Above where I said "user variable", it would probably be more correct to call them "special variables".  The expression evaluator often defines special variables.  For example, the [Total Time] token available in the display text at the top of the program.
Logged
Matt Ashland, JRiver Media Center

MrC

  • Citizen of the Universe
  • *****
  • Posts: 10462
  • Your life is short. Give me your money.
Re: Change Request: IsRange()
« Reply #18 on: January 30, 2013, 01:25:39 pm »

As Charlie says to Raymond, "You are beautiful, man".
Logged
The opinions I express represent my own folly.
Pages: [1]   Go Up