INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Ext4 => ExFAT -- how can I find long filenames and prohibited characters?  (Read 1581 times)

drmimosa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 688

Hello,

I have a FLAC library of about 32,000 files, currently stored on an Ext4 filesystem. I would like to move all the files to a drive with an ExFAT filesystem.

In the past when I have done similar migrations, I have run into the following problems:

1. Filelength of classical music titles is too long - these files get ignored and not moved.

2. Some songs have prohibited characters in the titles which are now in Ext4 filenames, and can't be moved to ExFAT.

3. Mystery problems - Some files don't copy

Quote
From wikipedia: https://en.wikipedia.org/wiki/ExFAT

Max. filename length   255 characters
Allowed characters in filenames: all Unicode characters except U+0000 (NUL) through U+001F (US) / (slash) \ (backslash) : (colon) * (asterisk) ? (question mark) " (quote) < (less than) > (greater than) and | (pipe)
(encoding in UTF-16LE)

So, what often happens is I attempt a copy, and go on a wild goose chase needle in a haystack ride to find all the files which failed to copy, and just those files. And even then, it's tough to verify that I have copied all the files because some of the filenames have changed - rsync, for example, can't be used to compare the two /Music folders - I'm relying only on the total number of files in the /Music folder to confirm I copied everything.

Is there a way to address these problems with Rename, Move, Copy files?
Can I set a max filename length (that includes the filepath), for example, of 200 characters, leaving a buffer for the 255 filename max?
Can I remove any of the above prohibited characters?
Do I have to consider moving from a filesystem with owner and group permissions to one without those?

Does anybody have any other suggestions?

I'm posting it here because it is a very "linux-y" problem, but may repost in the Windows forum in a bit as well. Thanks for everyone's time and consideration here, any time spent thinking on this problem would be greatly appreciated!
Logged

mwillems

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 5174
  • "Linux Merit Badge" Recipient

So, at the risk of giving a very "linux forum" answer, have you considered whether you really need the files to be on an ExFAT drive?  It's an unreliable filesystem with lots of limitations, and if all you want is cross-platform readability, the NTFS drivers for Linux are quite mature and I believe they support longer filenames now too.

But assuming ExFAT is mandatory, when you say the copy fails, have you actually tried JRiver's rename, move, copy function or are you doing the copying with filesystem tools?  My recollection is that Rename, Move, Copy already takes into account and removes certain special characters automatically.  I see it doing that whenever I use it to rename files on samba shares that contain, say, colons or other special characters.   It may even enforce file lengths.  If you've tried rename, move, copy and it failed, we should compare notes on settings and see what might be different.

As an alternative if rename, move copy isn't working right for you, you could try and use the "handheld" system to sync to a directory as I think the handheld system may be more prepared to deal with weird filesystem problems since it's writing to random devices many of which may have used ExFAT historically. 
Logged

drmimosa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 688

Thank you mwlliams, I should have stated my original requirement - I'd like to be able to read the files on multiple OS without issues: Linux, Windows, OSX. Future proofing the library to be flexible across operating systems, and read from a Mac Mini among other things.

Another thing is my wife only uses Macs, so I'd like to have any future backups of photos and media to be easily accesible, in case I can't give tech support for some reason.

Is there a better way to do this? Is there a better linux/osX compatible filesystem than ExFAT?
Logged

drmimosa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 688

But assuming ExFAT is mandatory, when you say the copy fails, have you actually tried JRiver's rename, move, copy function or are you doing the copying with filesystem tools?  My recollection is that Rename, Move, Copy already takes into account and removes certain special characters automatically.  I see it doing that whenever I use it to rename files on samba shares that contain, say, colons or other special characters.   It may even enforce file lengths.  If you've tried rename, move, copy and it failed, we should compare notes on settings and see what might be different.

As an alternative if rename, move copy isn't working right for you, you could try and use the "handheld" system to sync to a directory as I think the handheld system may be more prepared to deal with weird filesystem problems since it's writing to random devices many of which may have used ExFAT historically. 

I haven't tried to move all of the files using Rename Move Copy in a long time, years ago in 2018 was the last full migration of the files to Ext4. Recently I tried and failed using rsync. With your vote of confidence I'll give it another shot with the MC Move tool and report back.

Thanks!
Logged

Awesome Donkey

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 7371
  • The color of Spring...

With macOS, it only has read-only NTFS support by default. You *can* enable write support for it too, but it's experimental and in my own past experience it wasn't that good (plus it'll clutter your NTFS volumes with those cursed . dot files for every file). There's third-party NTFS drivers for macOS like Paragon or Tuxera (which aren't free however with them those dot files aren't created so no clutter to constantly remove), however your mileage may vary there. NTFS is my favored file system for storage between Windows and Linux, but with macOS in the mix you may have to buy an app like Paragon to get decent NTFS write speeds.
Logged
I don't work for JRiver... I help keep the forums safe from Viagra and other sources of sketchy pharmaceuticals.

Windows 11 2023 Update (23H2) 64-bit + Ubuntu 24.04 LTS Noble Numbat 64-bit | Windows 11 2023 Update (23H2) 64-bit (Intel N305 Fanless NUC 16GB RAM/256GB NVMe SSD)
JRiver Media Center 32 (Windows + Linux) | Topping D50s DAC | Edifier R2000DB Bookshelf Speakers

drmimosa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 688

So, at the risk of giving a very "linux forum" answer, have you considered whether you really need the files to be on an ExFAT drive?  It's an unreliable filesystem with lots of limitations,

Just curious, why is it considered unreliable?

A quick google search yields this summary, which doesn't sound *too* bad:
Quote
Conclusion

    The exFAT file system is not as fragile as anecdotes on the Internet may lead you to believe. Most failures are limited to the file being written and interrupted writes do not corrupt the entire file system. Anecdotes of corruption on the Internet are likely due to bad implementation of the file system rather than the file system design itself.
    It is important to run CHKDSK or fsck after an interrupted write as invalid entries in the FAT or allocation bitmap may lead to future random data corruption.
    For mostly cold data requiring cross-platform compatibility, exFAT is a completely valid choice for a file system.

Nevertheless, you may not want to choose exFAT because:

    Due to large allocation unit (cluster size), exFAT is quite inefficient if you have a lot of small files (I’m looking at you, node_modules). At 2 TB, the allocation unit for exFAT is 512 KB (any file size will be rounded up to 512 KB) while for NTFS it is 4 KB and NTFS even has special handling for very small files storing them directly in the MFT. Thus for this use case, it is better to use the native file system of each OS.
    The tooling, even by Microsoft, is not as great as other file systems. For example, Windows does not provide a defrag tool or a resize tool for exFAT.

https://pawitp.medium.com/notes-on-exfat-and-reliability-d2f194d394c2
Logged

lepa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1965

If you go with ExFat route you could create RMC rules to check length of path/filename with Length() funcition and if they are too long then truncate path/filename. Also use Clean() function to remove any peculiar characters
Logged

mwillems

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 5174
  • "Linux Merit Badge" Recipient

Just curious, why is it considered unreliable?

A quick google search yields this summary, which doesn't sound *too* bad:

Unreliable because there's no journal and a higher risk of corruption, but the limitations on file path length and symbols that can be included are also usability problems.  FWIW you can add me to those "anecdotes on the internet" that had an entire ExFAT filesystem corrupted by an interrupted write.  Perhaps it was a flaw in the filesystem implementation rather than the format itself, or maybe I didn't fsck when I should have, I couldn't say, but I lost a fair bit of data that way back in the bad old days.  Now, I only use ExFAT where it's strictly required.

I don't know much about OSX filesystem compatibility, so I can't make a recommendation; when I need both Windows and Linux I use NTFS, but Awesome Donkey suggested that NTFS isn't ideal for OSX.  FWIW, I currently just share my files over the network via SMB/CIFS which works pretty well cross platform and doesn't care what the underlying file system format is, but I recognize that's not the best option for everyone.

I haven't tried to move all of the files using Rename Move Copy in a long time, years ago in 2018 was the last full migration of the files to Ext4. Recently I tried and failed using rsync. With your vote of confidence I'll give it another shot with the MC Move tool and report back.

Thanks!

I'd advise that you try it with a small batch first just to test.  It will be easier to verify the behavior that way.
Logged

zybex

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 2370

IMHO the best way is to use a NAS to share the files as all OSs support SMB/NFS/CIFS shares. So instead of a USB disk moving around your PCs, you'd have network-connected storage than can be simultaneously accessed by all your computers, regardless of the OS. The NAS filesystem is usually Ext4-based, but it's better to avoid the same forbidden characters as in exFAT and NTFS because the files will be accessed by Windows too.

To answer your original question - note that the length limit is 255 chars *per path item*. The full path can have up to 32760 chars (~32K), but the filename and each folder name in the path cannot exceed 255 chars.

You can add an expression column to show invalid files. This one detects long path elements and invalid chars:

Linux version - for/paths/with/forward/slashes:
Code: [Select]
ifelse(
  Regex([filename],[\\:?*"\|<>],0),Invalid chars!,
  Compare(ListMath(ListMix(length([L1]), 0, Replace([filename],\,;)),1),>,255),Too Long!,1, OK)

Windows version - for\paths\with\backslashes:
Code: [Select]
ifelse(
  Regex([filename],[//:?*"\|<>],0),Invalid chars!,
  Compare(ListMath(ListMix(length([L1]), 0, Replace([filename],//,;)),1),>,255),Too Long!,1, OK)
Logged

drmimosa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 688

Thank you all. I will work on this on and off this week and report back and problems or challenges encountered. There is a wealth of information here, thanks mwillems for the preemptive warning concerning exfat filesystem stability, and everyone for the suggestion to implement SMB or NFS backup to work around OS filesytem requirememts. Zybek, I will use those expressions to investigate, thanks for posting.

Cheers and have a good evening.
Logged

drmimosa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 688

Ok, a near success, small failure. Like most of my adventures in Linux, ha.

Attempt 1: Rename, Move Copy for all 31,772 audio files. It failed somewhere at 22,000, and left the ExFAT mount in a Read-Only state.

Attempt 2: Rename, Move, Copy, this time with the following limits:

Directories Rule: [File Type]/[Genre]/Left([Album Artist (auto)],254)
Filename Rule: [Track #] - Left([Name],254)

This worked - but a new library importing the new folder now has 31,604 files, a difference of 168 files.

Conclusions:

1. First attempt failed because RMC doesn't automatically truncate filename elements over 255 characters.
2. I probably have some duplicate albums or files in my library - or some bad files.

I attempted a bunch of old ways of getting CSV album lists, but that was beyond the scope of my time right now. (Any thoughts on reconciling these numbers, or is it too time consuming and not really worth it in the big scheme of things to track down 168/31772 outliers (probably...) ?)

Another odd thing I encountered was that the GUI for Mediacenter froze for the entire three hour file transfer, but MCWS was responsive and serving files. I think this could be mistaken for a frozen program by another user, so if anyone reads this and attempts a monster RNC on linux, just be patient.

After all this I will probably take the original suggestion and redo it all on Ext4 and find a different cross-platform backup.

Thanks again for all the help and suggestions.
Logged

lepa

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1965

Also note that [name] doesn't contain extension so with extension filename might be longer than 255 (if you take Left([name],254))
Logged

David Sydney

  • Galactic Citizen
  • ****
  • Posts: 349

I am using an NTFS drive to hold my media for the smae purpose and access from a Windows drive originally but also exported the library over to a Linux install so now I access the drive from either (not at the as same time though..). I think MacOS can read off an NTFS drive which should not any issue on size and partition number. I don't try to write in MacOS. I do have a macOS system also accessing the NTFS volume (not very often mind you) but I don't typically write to the media files from MacOS...? .....just as a point of reference.
Logged
Dave
------
Linux Manjaro 23 / Windows 10 Pro | i7 14700K Gigabyte Z790 UD AX | JRMark 10253 | Realtek Integrated HDAudio SPDIF | PC Sound - Yamaha TSS-15 5.1 DAC (will be sad when capacitors die!)| Real Sound - DLNA Network to Yamaha RX-V777 Receiver Living Room + Deck | DLNA to Paired Yamaha WiFi WX-010 MusicCast Speakers to Outside Areas
Pages: [1]   Go Up