Topic: Doing IsEqual() on Very Large Field - Performance Impact? (Read 3261 times)

vagskal · « **on:** August 13, 2012, 01:36:22 pm »

Warning: This post will probably only make sense to advanced users (if I am lucky).

It would be cool to be able to see if a song in an album charted, i.e. was released as a successful single. I think I can get access to rather good chart data. I was contemplating adding all that data (several MB) in a structured way to one custom MC field (_not_ written to the file tags) for all music files and then have an expression column in my views with something like this If(IsEqual([Artist] ¤ [Name],[The huge field with chart data],8),★,).

Since it would require a bit of work (and probably regex help from MrC) to get the chart data structured and in shape, I wondered if anyone had any experience regarding the performance when using IsEqual() on a very large field. Is it even worth considering or would MC just come to a halt when viewing my views? I am also a bit concerned about the impact on the database file size. Would the database file increase by several MB for each of my some 150k music files? (I keep the database file on my C SSD disc that is about to run out of space.) Would it be better to store the chart data in a custom field in just one music file in the MC database and have that field Load() as a global variable every time I invoke a view?

One reason I ask is because I some time ago tried to use a global variable to store all album artists (just some 4k album artists) using IsEqual() and I could not get that to work. I used an expression column with something like If(IsEqual([Album Artist],Load(VarAlbumArtist),,Save(Load(VarAlbumArtist); [Album Artist]) to then try to use If(IsEqual(Primary Artist),Load(VarAlbumArtist),7),[Album Artist],) to have all songs, also songs on VA albums, by an album artist displayed ([Primary Artist] is a custom expression field with this expression ListItem([Artist],0) to show just the first artist in a list of artists). Any ideas on how to achieve this - a pane with only artists being Album Artist but showing all songs by that artist, also songs on VA albums - would also be welcome.

Matt · « **Reply #1 on:** August 13, 2012, 08:55:09 pm »

Several MB of data isn't a big deal. The database pools data when files use the same value. And large strings get compressed on disk and in memory. Look at the size of the .jmd file on disk to get an idea of the memory the field takes (they're roughly related).

It's hard to answer the performance question without knowing more. You have a huge number of files (100k+), and some of the most complicated views I've seen (12+ panes). Those are both headwinds to blazing performance. But the database and expression language are generally quite fast ( http://jriver.com/speed.html ) so you might be fine.

If you try it, please let us know what you find.

Vincent Kars · « **Reply #2 on:** August 14, 2012, 03:38:20 pm »

I wonder if a relational field might be useful.
Say one populate Series with a single value then create a field “Chart” and set it to relational with series.

There are some impressive posts about combining chart data with JRiver.
The problem is always the same, how to combine external tabular data with tabular data inside JRiver.

An option would be to export the value of the tags to e.g. MS Access.
Inside Access join with your own tabular data and import in JRiver.

One can already do so more or less with playlist.

Matt
What about such feature?

vagskal · « **Reply #3 on:** August 14, 2012, 05:30:43 pm »

Thanks for the replies.

Matt's confidence in the database and expression language speed made me decide to at least test. I am afraid I have not much to report yet apart from the fact that pasting about 1.4 MB data into a custom user data field in some 150k files takes a V E R Y long time. I did the pasting around 1 PM and now, at 12 PM, MC is still contemplating that action and cannot be used. I give it the night as well. I wonder if it would take this time to paste the data to just one file.

Vincent, I am not sure I follow your idea about the Series field. I have seen some threads on importing chart data, but that just seemed to me like too much trouble. For me it is for now enough if there is an easy way to indicate if a song charted at all. When/If I decide to get the chart data in shape (consistent naming of song names and artist) I will do it on the full data set in order to take advantage of that clean up if it gets easier to import data to MC.

I will report back.

rick.ca · « **Reply #4 on:** August 14, 2012, 11:56:24 pm »

Quote

pasting about 1.4 MB data into a custom user data field in some 150k files takes a V E R Y long time.

I have no idea why that would take so long, but before Matt's comment I assumed it would be a problem—and figured using a global variable would be the way to go. I just tried that now, and it works fine.

A while back I added chart data ('Hot 100' for years 1955 to 2009). You participated in discussions about that at the time, so I won't get into it here. That data makes a good test because it's already prepared and added to my library—so I can compare the results of the two methods.

I added the custom list field [Chart.Hot100] and pasted the chart data to it for one file only. I then added an expression column to Save() that, and then Load() it for comparison to each file's [Artist] - [Name]...

If(IsEqual([Filename], D:\Audio\Popular\a-ha\2010 The Very Best of\01 Take On Me.mp3), Save([Chart.Hot100], Hot100), )
If(IsEqual(Load(Hot100), [Name] - If(IsEqual(Left([Artist], 4), The/ ), RemoveLeft([Artist], 4), [Artist]), 8 ), Hot 100, )

(BTW, a-ha, like ABBA, only exists in my library as an historical curiosity. Being at the top of the list, it's the subject of many experiments. In this case, strangely enough, the first track also charted.

)

'The ' had been removed from the artist names in the data, so I did the same here. And all this does is flag the hits as 'Hot 100'. The files are already tagged with the chart year and rank, which is obviously more informative. Had I been more patient, I could have extracted that from the data using Regex(). Instead of banging my head on the wall, I'll leave that detail for MrC. The form of my chart data is: YYYY - ### - Name - Artist. It should be easy, but...

Although this approach is nice in that it avoids the manual tagging step that other methods require, it still suffers from the same issue that makes any method difficult. Unless the Artist and Name match exactly, it doesn't work. So that leaves out files—with no means of determining what or how many were missed. In my library, it only found 2/3 of the files I had previously tagged. That could be improved by a more creative substring comparison, but I'm sure it would still fall quite short of 100%.

vagskal · « **Reply #5 on:** August 15, 2012, 04:02:41 am »

Thanks for the reply and testing, Rick. I suppose your report means that you experience no performance issues when doing IsEual() on the global variable. The data you pasted into a field in one file and loaded to a global variable must be a lot smaller (100 items times some 75 years) than what I am working with.

A progress(?) report: When I got to my PC this morning I was met with MC's warning message that saving data to this many files could take a very long time. I thought it was a bit late for that message to appear. Although I recently had a case where a friendly warning message from MC about my own stupidity would have saved me a lot of work, the warning in this case just unnecessarily halted a process that already had been going on for hours. I clicked OK on the warning message some 45 minutes ago and MC is still working. This is obviously not something I am going to do every time I have to correct a piece of information in the chart data.

Matt, I interpreted your reply so that pasting identical data to one or very many files would take an almost equal amount of time (the data pooling thing). Is that correct, or would using Rick's method (pasting the data to just one file) speed things up?

I think I will soon have to put MC out of the misery I created for it and file this idea under stupid things you should not do with such clever software.

vagskal · « **Reply #6 on:** August 15, 2012, 10:50:38 am »

OK, MC is finished after a very long time.

The .jmd file for the huge [Billboard TEST] field I made was only 715 kB (If I zip the .txt file I pasted it is 424 kB).

I made an expression column with this expression If(IsEqual([Billboard TEST],[Primary Artist] ¤ [Name],8),★TOP,), where [Primary Artist] is the (only or) first artist in a list of artists and [Billboard TEST] is the raw chart data formatted like [Artist] ¤ [Name]; [Artist] ¤ [Name]. The expression worked and seemed to return reasonable hits. I did not notice any impact on performance (apart from the looong wait to get the data into MC) when I just scrolled in a view with grouping on. If I sorted that view by [Name] thereby disabling grouping, I got a very noticeable decrease in performance. If I tried to sort by the expression column, MC stalled for so long that I did not have any patience left to wait it out. Changing one file tag took considerable longer than before, so long that it would be painful to have the expression column in a view used for tagging.

This did not go so well.

I think I will just archive the current library and restore the backup I made before adding the data. Matt, I can give you access to the library if you want to do some stress testing with huge fields without having to wait a day or so to get the data into MC.

Matt · « **Reply #7 on:** August 15, 2012, 10:54:34 am »

Quote from: vagskal on August 15, 2012, 10:50:38 am

I think I will just archive the current library and restore the backup I made before adding the data. Matt, I can give you access to the library if you want to do some stress testing with huge fields without having to wait a day or so to get the data into MC.

Could you provide the simplest step-by-step to reproduce the day long delay?

I'd be happy to look at the library backup too, although it might be a while before I have some free time for it.

vagskal · « **Reply #8 on:** August 15, 2012, 11:13:23 am »

Sure. I will email you links to the library and the .txt file I used.

1. In MC create a new custom user data field. Data type: String, not relational, Edit Type: Standard and NOT saved to file tags.
2. Open the .txt file in a text editor (I used MS Word 2007) and do ctrl+a and ctrl+c.
3. In MC show the newly created field in the tag action window, do ctrl+a on the entire audio library to select all audio files, enter that field in the tag action window, do ctrl+v, hit enter (or return, cannot remember) and wait for a day to get the warning message about the long time it can take. Dismiss that message by clicking OK (or maybe Yes) and wait for another couple of hours. I used the view "Artist/Kompositör Pane" in my library when pasting. The view "Test Pane\Artist/Kompositör Pane (1)" in my library contains the expression column I used.

The MC benchmark for my PC is: JRMark (version 17.0.184): 2358

No hurry. I am used to waiting by now.

Matt · « **Reply #9 on:** August 15, 2012, 05:29:22 pm »

A coming beta build will fix the slow paste issue.

But there are still a lot of areas of the program that aren't all that happy with huge strings.

Normally a big value is a few hundred characters. This is 10,000 times larger.

I'll slowly chip away at a few more of these bottlenecks, but I wouldn't recommend this approach for now.

Instead, run your expression and save the result to another field. That static field will be fast.

rick.ca · « **Reply #10 on:** August 15, 2012, 06:39:38 pm »

Quote from: vagskal on August 15, 2012, 04:02:41 am

I suppose your report means that you experience no performance issues when doing IsEqual() on the global variable. The data you pasted into a field in one file and loaded to a global variable must be a lot smaller (100 items times some 75 years) than what I am working with.

No, there was no performance hit I could detect. But then I'm doing lots of other stupid stuff.

Yes, the data is less—about 250 KB.

I guess my approach avoids the 'slow paste' issue.

vagskal · « **Reply #11 on:** August 16, 2012, 03:17:33 am »

Thanks for fixing the slow paste. That was a fast fix. It was the most important obstacle in this use case, since I can do a one time operation to set the static data as you suggest once the data is imported.

I am fully aware that this is an abnormal use case pushing the limits, hence this whole thread.

vagskal · « **Reply #12 on:** August 16, 2012, 04:42:11 am »

As a beta tester I got a chance to try the build with the fix. Pasting 1.4 MB data to a field in my some 150k music files is now down from a day or so to under a minute. That is what I call progress!

rick.ca · « **Reply #13 on:** August 16, 2012, 06:00:10 pm »

Quote

That is what I call progress!

I call it proof there's always value in trying "stupid things you should not do with such clever software."

Did you try the global variable method? It would be interesting to know if there is any difference in the load time.

vagskal · « **Reply #14 on:** August 16, 2012, 06:44:35 pm »

Quote from: rick.ca on August 16, 2012, 06:00:10 pm

I call it proof there's always value in trying "stupid things you should not do with such clever software."

Yeah, stupidity finally pays off.

Quote from: rick.ca on August 16, 2012, 06:00:10 pm

Did you try the global variable method? It would be interesting to know if there is any difference in the load time.

I did, but only in the same view as I tried the other method so I cannot tell any difference. I deleted the expression column with your Load() method when I did the final performance tests so it would not interfere.

You could try it yourself now that it only takes under a minute to import the data (to 150k files so it should not be slower to import to just one file). The raw data came from the Whitborn (spelling might be incorrect) project you referred to in another thread, and I exported just the name and artist columns to a .txt file. However, I interpreted Matt's response so that it will take a while until MC accommodates huge value fields whichever method used.

What about the other issue I wrote about in the initial post (loading all album artists to a global variable and use IsEqual() to find out which artists are also album artists). Have you tried and succeeded with that one?

rick.ca · « **Reply #15 on:** August 16, 2012, 10:32:13 pm »

Quote

However, I interpreted Matt's response so that it will take a while until MC accommodates huge value fields whichever method used.

Thanks. This is what I was interested in. It seems to confirm my understanding of what Matt said.

Quote

What about the other issue I wrote about in the initial post (loading all album artists to a global variable and use IsEqual() to find out which artists are also album artists). Have you tried and succeeded with that one?

No, I had difficulty understanding what you meant. On second reading, it seems you're looking for a way in which the multiple-artist albums in a view grouped by Album (i.e., the standard Artist-Album grouping) will include not just the tracks of the particular artist, but all the tracks of that album. So a soundtrack album, for example, would appear in it's entirety under each artist contributing to it. That would be interesting, but I smile to think it would also mean all 500 tracks of my Rolling Stone 500 album would be listed under each of the 236 artists it includes. But I digress...

No, I have no idea how that might be done. At least not in the direct manner your probably aiming for. It could be done indirectly, using the virtual album stupid trick. But that would require creating a virtual album for each artist on the album, which might be a bit much. This, however, is consistent with my system that only handles up to three unique virtual albums per track. That's based on the idea I would never want to define more than three virtual albums using the same tracks—normally from one artist (i.e., compilation albums). In the case of multiple artist albums, perhaps the same limitation is reasonable—that is, I would never want to add more than three virtual albums, since it really only makes sense for the primary contributors to the album. $:-\$

vagskal · « **Reply #16 on:** August 17, 2012, 01:41:37 am »

This is what I meant (I see now that the example expressions I wrote from memory in the first post are less than perfect, so I leave them aside): Have one pane showing only artists that appear in the [Album Artist] field. Select an artist in that pane and have the file list show all files where that artist appears in the [Artist] field.

rick.ca · « **Reply #17 on:** August 17, 2012, 03:20:24 am »

Okay. I hesitate to say I understand, but let me try again. My answer is similar...I can't think of any direct way, but the indirect way is very easy. That would be just to have both [Artist] and [Album Artist] panes. Select a value from [Album Artist]. The [Artist] pane will then show only the artists for with an [Album Artist] value of that selected. Select the artist you want from the [Artist] pane and reset the [Album Artist] pane to 'All'. The file list will now "show all files where that artist appears in the [Artist] field."

My library doesn't offer any good examples because I only have a handful of [Album Artist]s set. I do, however, have an [Artists] expression field that combines [Artist], [Album Artist], group [Members], [Featuring] and [Related] artists. With that, I can get a list of files associated with "Bob Dylan," including albums and tracks from The Band, The Travelling Wilburys and the Rolling Stone Top 500 as well as his solo albums. I can then select "Bob Dylan" from the [Artist] pane to restrict the list to his solo albums.

vagskal · « **Reply #18 on:** August 17, 2012, 07:18:30 am »

Thanks for the reply.

Yes, I know of those methods and use them myself. I wanted it my way simply because I am used to it from other software: Having just a limited list of artists to browse (I consider only artists which have an album of their own in my collection to be my "real" artists) but still being able to show each artist's complete work in my collection.

And claiming something cannot be done in MC usually results in someone proving me wrong and I get things the way I want.

rick.ca · « **Reply #19 on:** August 17, 2012, 03:30:55 pm »

Quote

And claiming something cannot be done in MC usually results in someone proving me wrong and I get things the way I want.

I know, and I share your pain. Being able to do it the easy way is cold comfort when I know there must be a really cool stupid way that I can't quite figure out.

INTERACT FORUM

Author Topic: Doing IsEqual() on Very Large Field - Performance Impact? (Read 3261 times)