I have many many hard drives from years of data hording, some with duplicate pictures and videos, in so many layers of folders and subfolders from old phones, cameras, SD cards and recovered drives. It's total chaos and I don't know where to start with organising it all. Is there any software to go through it all? Thanks

Comments (125)

Hello /u/AndypandyO! Thank you for posting in r/DataHoarder.

Please remember to read our Rules and Wiki.

Please note that your post will be removed if you just post a box/speed/server post. Please give background information on your server pictures.

This subreddit will NOT help you find or exchange that Movie/TV show/Nuclear Launch Manual, visit r/DHExchange instead.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[deleted]

I did that 10 years ago... And then the drive failed. I never got the motivation to do it again.

Better to have the data unorganized than not at all.

god that sounds laborious. is there not software out there that can identify all types of photo / video and organise by date?

LOL

I'll take that as a no haha

Not exactly what you're looking for, but WinDirStat is what I use as a developer to find lots of files on many drives. It will give you a good idea of how much stuff you have.

I was using WinDirStat at work a few days ago and the young whippersnapper who just joined us asked “what in the windows 95 is THAT?”

It’s the pinnacle of perfection, Son

WinDirStat is also very slow. WizTree can scan TBs of data in seconds, WinDirStat takes minutes.

So I went in wanting to not like it… but it’s fantastic. Portable and fast! Great suggestion

I love WinDirStat so much that I had to have a remake on my Linux machines. For similar people, give Baobob a try!

Huh cool, I'll have to give that a go! There's also K4DirStat if you have a craving for the WinDirStat style.

I prefer WizTree, it's much faster.

God do I feel this. Hahahahga

[deleted]

At this point, it is trivial to throw face detection at photos.

I really miss the old 'Picasa' app. It did that pretty well. :(

I have yet to find a suitable replacement for Picasa. I loved it for facial recognition as well as being able to make face movies

As much as you’re going to hate the company suggestion, Apple photos does a better job. The face detection is insane compared to picasa, and was heavily updated in the more recent versions. Lots of AI organization, and it constantly generates albums/slideshows about vacations, “best of” shots, or just related photos.

I can’t find anything on windows that matches this program.

Do you have to use an Apple computer to use it? Does it work with the pictures stored locally?

Yes to both.

It has the option for a semi online state as well. It can optionally (self-managed) offload photos and stream them in if your hard drive is not big enough, and has perfect sync to iOS/iPadOS. All face/object indexing is local and can be run at its full potential without internet. Each device runs its own indexing and they only sync manual confirmations. Both devices decide “These 72 faces are the same” and “dudeguy confirmed this one face on iPad, so the other 71 photos we linked to it must be the same”

with Google photos they can be “backed up” or not. Local or not. Or sadly “optimized”. With Apple they’re either fully in the database or not, and that database may be fully synced to iCloud, or not, such as a secondary library.

As for improvements, Apple face detection now uses the full upper body torso and will recognize people from behind, if close to another shot from in front.

It really was an amazing app. It worked so well

I ended up having to write my own.overly complex importer/organizing scripts which would rename things based on the metadata and organize them as I wanted.

Things got really messy/a-complete-non-starter with software when multiple drives came into play, or an audio/video file contained a separate sidecar .txt or xml file that needed to be parsed to set the file naming.

Even now 10+ years later I'm constantly having to tweak and modify them based on new devices I buy. EG, I had to change my filenames to include the time also ala YYYYMMDD-HHIISS_(device model)_(file index).(extension) because my Sony A7SII will reset the file index counter if you format a card. Hence I was ended up with duplicate filename and ones that wouldn't sort correctly if I was having a busy day and reformatting multiple cards.

How is device model important enough to be in the file name? If you have location included as exif data, then so is device model. Do you put in the lens too?

Also it might be time for bigger cards. I’ve got some nice reliable 256gb cards on me. But the last week long festival I shot was with a testing-rental (dual SD) R6. I tossed a 512 in one slot and thought of the 256 as just my “in case shit happens” cards.

Good question.

Oh it is really important for me. I shoot a lot of multicamera concert videos hence media is coming from multiple camera angles and audio sources.

It greatly speeds things up for me in editing by being able to look at a directory of files and instantly be able to tell which files are from which angle/camera. If I look at a list of 100 files and see that six of them have "GH5s" in them, I know I almost always use that camera for the drummer, so I can quickly put all of its files on the "drums" track of my timeline and move onto the next camera.

Especially a year later if I'm looking for to grab a particular still from a video from a particular angle, I will know what camera I shot it on and can narrow the hunting process down a lot quicker.

I always run a backup recorder in "cases of shit happens" (to at least have "something") and use the same one...hence if I see "ZoomH4n" in the filename I know I can ignore that audio since it probably isn't as good and don't need it unless there is a problem with the main audio.

Even just having just the device name isn't enough at times. I need to look to see if the GoPro 8's have a unique identifier. Sometimes I will shoot something with both of those cameras running at the same time. The file indexes sometimes overlap (but don't collide since I include HHMMSS in the filename), so it is time consuming to separate out the footage from the two cameras (especially with GoPro's odd file indexing scheme) so I can get them on a timeline and start editing.

The card issue...oddly it isn't a space issue actually, it is when I shoot on multiple days and the stuff from the first night's show goes past midnight into the next day, but then the stuff from the second night starts on the same day. This is why I had to start putting in times into the filenames also (in 24 hour format so they sort correctly). For example: I kept ending up with two "20181230_a7sii_c0004.mp4" files. One was shot around 12:30am on 12/30/2018, and the other was shot around 11:00pm on 12/30/2018. Hence adding in the time also solved this problem.

My hard/fast rule now is that every single file I have on my server has a unique filename no matter what folder or drive it's on. I know some people just copy the files out and organize them into folders by shoot date/topic. I did this originally but it proved to be a pain when random files would get copied out, or if I had files copied out somewhere I wouldn't always know where they came from unless I viewed their actual contents/etc.

Most of that EXIF data is going to be lost (if it ever existed). You're best bet is to identify the drives by age (or time period you used them) and do it that way. Start with the oldest (or newest, however you want to do it) and sort by date modified/created.

I'd start with a brand new 12-18TB hard drive and use a fast interface for the drives your transferring from; file transfer will probably take as much time as the sorting. Make sure to carve out a couple weeks on your calendar. In fact, I'd probably do all of the sorting first and then start the file transfers before you go to sleep.

If you use an app like BeyondCompare to transfer the data, it can maintain file attributes (e.g., date created), otherwise this newly organized data will all look like brand new data on the new drive(s).

EXIF data is stored inside a file, unsure why you think it could have got lost over time. maybe you mean filesystem metadata?

By "if it ever existed" I mean it may have never been stored at all depending on age/source. You can also accidentally strip metadata if you process the files. Filesystem attributes can be even more volatile.

I'm referring to "Most of that EXIF data is going to be lost ". In OPs case just dumping files onto drives it seems most likely that any EXIF data would still be intact, but yes it's certainly possible it's been stripped/altered if they were processing files.

Check phockup. It is a small command line software I use to sort my photos by date. By it does not de duplicate except on name I think.

https://github.com/ivandokov/phockup

What i did was simply dump everything to Google photos. Let their algo sort everything out. If the videos or images have exif details it will automatically be presented by timeline.

So it serves as a backup immediately already. Its not perfect, but their AI is pretty smart, able to search/sort by pets or bicycles too for example.

Your mileage may vary....

Google Photos might get very expensive if you're backing up at original quality. I had to switch to Storage saver (free for Pixel users) and backup, in parallel, to Synology Photos.Synology Photos is a pretty good Google Photo replacement, but the AI is pretty bad at grouping photos by people's faces or object types

Thats not a bad idea

You can’t create a custom smart album on Google photos. It’s designed more for “let us find cool stuff to show you” rather than any form of sorting.

It’s the lowest level of organization but with the best visual indexing.

how would it do that?

even you dont know what you are dealing with until you looked at the pictures and remember where the heck this is from.

Even if there was a software that can do the copy work for you you would still need to tell this software what it should do and based on which rules.

Are your files stored on NTFS-formatted drives by any chance?

some of it is, yes

there are plenty of apps our there to delete duplicate photos but I've never seen an "organize all this for me plz" app for photos.

I used to make good use of faststone. Could batch-rename images based on exif and other variables you put in. I would rename all my photos to yyyy/mm/DD hh/mm/ss_.JPG or whatever. Worked a dream

I keep redoing the naming scheme every once in a while and creating a new mess for mobile pictures. Pictures from my mirrorless I can organize based on year/month/event. But phones changing every 3-4 years and the exporting process maybe 3 times a year. It just keeps being a mess all the time.

For a master collection of photos and videos I dump everything I find into my Photoprism. For example, I tossed everything from my google takeout into it.

There's digikam if you don't want to run a server.

I use both actually and I'm pretty happy with it. I use digiKam to figure out my pictures, geotag them, fix the dates when I can and put them into folders (and it can deduplicate!). I use photoprism mostly to view them on the go.

I recently started using a mariadb container and use it for the DBs of both apps since they do both use mysql. It's not a must have but it helps me with backup and with perf.

Does photoprism store all of the photos in the database, or does it read directly from the file system and just store confit/metadata in the db (like digikam)?

It only stores metadata in the database and it stores cache (thumbnails) in a local folder. That does take quite a bit of space (in my case it adds about 20% to the total storage of my pictures) but I don't mind also backing it up. I'm not sure it would be really problematic to lose it though, didn't check.

Edit: if you care, I have about 205GB of pictures, the cache folder is 46GB and the database files are 453MB (photoprism) and 460MB (digikam).

That doesn’t seem bad in the grand scheme of things. Definitely considering photoprism now

I use photoprism as well, but haven't used or configured anything besides the initial scan. 72GB photos, cache 4.5GB, db 30MB

i can not imagine the amount of cache and database files in google photo servers seeing how big yours are already... no wonder they stopped unlimited. they must have ran out of data centers XD

anyway thank you for the software recommendations !

I would hope to God that it only stores metadata in the DB.

I have a "Trips" folder for photos and vids I take myself.

Seconding Photoprism! Have been using it for a few months and it has been excellent. Love the facial recognition features.

Thank you thats helpful

[deleted]

Yes, czkawka is pretty good. If /u/AndypandyO is also interested in de-duplicating similar (but not exact) pictures, I recommend AntiDupl.

It'll detect bit-identical, rotated, lower resolution, even cropped depending on your settings. It'll even compare exif, dates, blockiness, blurriness, preferred directories, preferred filetypes, and a dozen other things to make intelligent recommendations on which picture should be deleted.

Yes, Czkawka can do this too, but it isn't nearly as powerful for rotated images, and stuff like that. Plus I really like AntiDupl's recommendation options.

I've used it to ingest photo archives that I knew I already had 90% of the photos, by defining the two directories, and adding the new files as a "delete" directory, and just telling it to apply all recommended actions. (After running it without the new files) BOOM, all duplicate photos deleted, all new ones left.

Just a thought. I am in the same predicament.

I was thinking about the following steps:

1) Make an estimate how much data there is.
2) Calculate a buffer (depends on your budget) example 30%
3) Calculate how much harddisk space you need with a RAID-configuration with ZFS.
4) Make file server.
5) Copy all data drives to the file server in seperate folder. Name the data drives in correspondence with the folder.
6) Make a database / list tree of all the files in the file server.
7) Manually sort files by extension / year
8) Use a deduplicate program on the sorted files.

The original media can act as original/backup.

Years of work for me. Piles of old floppies and CDROMs and DVDs and hard disks.

Started to be able to bring some order after I discovered hashdeep. Basically I started from a reasonably clean disk with folders to sort files, created lists of hashes using hashdeep, then used it to scan all my existing disks for unknown files. With the correct flags hashdeep can list all files it finds on a disk that it has not in its lists already. That help a lot to figure out what is worth wasting time on. It also is useful because every now and then that makes me realize the copy of some old file I have is broken (probably usually because it was stored on some CDROM that was no longer good).

I use another tool called rdfind to find duplicate files and replace them with hard-links to the same file, so that no space is wasted on duplicates. I know some modern file systems can do things like that automatically, but I have not progressed that far. Manually scanning for duplicates with rdfind is as far as I go for now. Probably only works in Liunx. I have no idea what tools there are in Windows or if it even supports hard-links properly. (Hard-links are also very useful when sorting files, because if I can't decide what folder to put a file in it probably belongs in both, so I create hard-links so that it shows up in both without using twice the space.)

If you already have a central file server, one approach is to simply copy each piece of media to its own place on the server, where it is backed up. Then you can kick the can down the road and deal with organizing in the future.

If you have specific media you want to organize, you can do targeted searches for file extensions and gather all of it up, perhaps saving the existing folder structure with it. Then you can work through organizing it.

If you already have a central file server, one approach is to simply copy each piece of media to its own place on the server, where it is backed up.

That's what I do. You can zip up each hard drive, SD card and so on individually, or put them into their own .iso files using Imgburn. This helps you resist the urge to delete files from inside the .zip and .iso files, so you'll never wonder if you accidentally deleted something.

Duplicate photos can be a nightmare, especially when they number in the 10s of thousands. I had similar issues where photo libraries got duplicated and filenames changed. It was a hot mess. I used this little utility called Double Killer. It will scan target directories for files that have matching CRCs and give you the option to remove the duplicates. I then used RED to remove the leftover empty folders. Works well even with files that are renamed as long as the file data itself has not changed in any way. Cut down what would have taken weeks to a few afternoons. Good luck!

https://www.bigbangenterprises.de/en/doublekiller/

https://www.jonasjohn.de/red.htm

Git-annex

looks interesting

It's hard at the beginning, but later it pays off to use it. Guy who write it does really hard work, so maybe it's good idea to throw some money at the guy.

Get everything in the one place; or at least centralise by category (so all photos on the one mounted drive or mergerfs pool) then rdfind is a great tool to weed out duplicates. Very easy.

After that its just taking thw time to organise it all properly per your directory plan.

try dokkio!! https://www.dokkio.com/

It's free and organizes your files automatically for you with AI and also integrates w multiple diff clouds.

Works in browser/mac/windows, I don't use it much for my local files but for cloud files yes

very interesting

Does it even support local files? From an initial glance it looks like it only works with cloud services.

it does in their desktop app (https://www.dokkio.com/downloads to download it if u want) but not in the browser version. im just too lazy to download the desktop version lol but now i guess i'll try it cuz i wasn't 100% sure about the local file thing

Hydrus is very useful for organizing photos and videos. It even has a built in duplicate detector.

On import, you can have it tag all the files with whatever directory it was in (or tag it with many layers of directories).

Buy large hard drive, enough to hold all the data

Copy data from all the hard drives into the new one, with some kind of rudimentary folder structure

Dedupe the files

Organise the rest. This step might take the rest of your life.

From step 1, have a way to backup this data

https://johnnydecimal.com/ may be able to help you get in the right mindset.

I recommend you the method, which helped a lot people already. Put all the data to ToBeSorted/(current data in YYYY-MM-DD) directory. If it will be in one pile on big drive/pool of drives, you can start with easy structure which you expect to use. For me it was OSes, Music, Repos, Photos, Videos. When you sort things you do it just for few minutes, when you have free time. You can watch Netflix and sort during less interesting parts of the show, if you do it alone.

Photos/videos can be sorted by year/month/day, Music by Genre/artist/album, whatever fits you.

If you have new data, but no time to sort it, just put it into ToBeSorted with current date. When you come back to it later, you should start from most recent data - you remember it best so it will go fastest.

After some time your structure will grow in organic way which fits your brain. When you sort your collection manually you will start to know it, so you will be able to tell where you put things.

If it will outgrow your current drive you can always take branch of your files tree and leave there a note where it went. My notes contains short info identifying the current place of data and list of data. On most of my online devices i keep easy file structure thanks to git annex, but friend keeps just text files with paths and it works too!

https://www.ghacks.net/wp-content/uploads/2008/08/cathy-disk-catalog.jpg

Cathy is a ridiculously small and fast disk cataloguing program. It's old but so effective, just give it s try and you'll see what I mean!

https://cathy.en.lo4d.com/windows Cathy can be downloaded here...

Lightroom (offline) may be a pain in the ass but once you get into import profiles and metadata tagging and then using face indexing and per building/area GPS tagging it makes 300k+ tidbits of media easy as you can target lookup anyone in the family tree with one database folder that can connect to a G-Suite shared drive or any local backup or Unraid server in 30 sec I used the following formatting.
For Photo/Video/Audio media folders:
Media - Person - Device - Year - Month - Day & Event Name

For media, it uses Make Model YYYY.MM.DD HH.MM.SS
i.e Sony ILCE-7RM3 2022.03.23 11.59.55.DNG This allows you to instantly manually figure out what and when.

That is how I sorted years of family media if it's raw photos DNG with full preview and originals embedded, rest as is, however, analogue media is indexed per tape with conventional and RF copies of tapes with index logs its a rabbit hole original digital data is a lot easier as you don't have to make metadata for it from the ground up.

Then checksums and copies and copies once you got the index back up everything and keep it checked and in sync hell burn some to 100GB M-Disks.

Table

Personally I’d inventory everything first with something like WinCatalog (I really like this app) and that should help identify duplicates and different types of content and let you create some tagging and essentially experiment organizing it with having everything in front of you without running the risk of actually moving data around.

I remember when I was trying to organize a huge stack of old Zip disks years ago I moved the data off and then was trying to reconcile and through one mistaken click lost it all. Whoops!

Figure out the total, then buy a NAS about 50% that size with room to grow. Start a top level directory structure that makes sense to you (pictures, series, movies, music, etc) then plug in a drive and start dividing up the content into your new directory structure. Every now and then run some duplicate file finder and pick the ones that can go.

Once you empty a hard drive, clean it off and test it, and then use it for backup and cold storage. make sure you label the power supply and drive enclosure so that they can be paired together again. (I like to keep both in a plastic bag together with their contents sharpied on the bag itself.)

Divide whatever storage you have into three sections. Clean, Dirty and Working. Bring your dirty files to working and use dupeGuru or something like that to get rid of the dupes and organize/ structure the stuff. Then add it to clean. Rinse and repeat.

Before you start organizing I would suggest backing up all your pictures and videos. Old phones cameras and sd cards could be on the verge of failing. If they are that old then probably a new large drive could store all of them in a single device.

Then when you have them in one or a couple of external drives you can back them up with something like backblaze.

Then start sorting them. Either by them or dates. I prefer dates since relevant pictures usually have a specific date (birthday, anniversary, etc). You could work on that a couple or hours every x days.

The copying to the new drives will give you and idea of what content is there. Then you can best figure out how to organize.

There are programs that can do facial recognition that could help but with a ton of files over years it usually not that useful.

Organize things well, A "Games" and "Music" folders are fundamental in naming especially case a lot of programs related to those you may want to install automatically open up a folder with that name, If you already have one, you dont risk derailment.

Next keep Folder names as short as possible, you dont wanna run into some BS later down the road like not being able to transfer some stuff into your main hoarding folders case you kept too many subfolders and the naming too large.

[deleted]

It's all personal data to myself, friends and family that I have collected over about the last 15 years. so yes I am willing to give up a lot of time and money to organise it

do it, but beware, it will indeed take a long time. i stored my files at the worst possible place and it took 7 hours just to organize it properly. now i hate it back up on mega

If most of it is just photos and videos I suggest using a cloud backup service like Amazon photos (paid) or Nextcloud (self hosted). Amazon photos is nice because once it’s organized in there you can download it all organized for a backup. Nextcloud is nice because you can install applications to help view the photos in cooler formats. For example, using the metadata it’ll map out where photos were taken. Nextcloud can also store more than just photos and videos, think of it like a personal google drive on steroids and they don’t scan all of your uploads. You can set up Nextcloud to be only on your house network or outward facing, depends on your security requirements and experience. Both services will first organize by date taken. Both services can be synced live with all current mobile and desktop devices meaning that anywhere you take photos you don’t have to manually reupload or sync or bother with having out of sync errors like other backups. It also helps to know that you won’t lose data if your house has an accident. Years of photos and videos are priceless, I would not risk having only one copy. My grandfather did that and lost almost all of his photos since 1991 which would have been all of his grandchildrens childhoods. He know has more backups at different locations including the cloud. I would avoid google drive/photos and apple photos because they haven’t been as cross compatible with different operating systems even though they are nice Edit: added benefit to either service is that you can easily share access to family and friends

I think this is the answer, excellent idea. thank you

By forgeting about it and move on jeje ..

The way you put it sounds more like it's something that's entirely up to you (as opposed to something I don't know like "how do I ask youtube-dl to make nice directories one per channel and names that contain the date and title").

Also years of data LOL, it's been "years" since the pandemic started. The only people who think "years of data" is something worth mentioning either got their first digital device a few years back (so they're like what, under 10?) or somehow manage to lose all their data every other year.

For someone like me who started Datahoarding around pandemic, a few years of covid was more than enough to get like 14 Tb amassed.

about 15 years worth of data. I asked here because I wasn't sure what terms to google, the titles of all the data are pretty random since they were made on old devices.

I felt compelled to jump in this thread because i'm facing similar issues. Seems your core problem is sorting through photos and videos from older devices. I've been using Directory Opus for some time. It's not an automated way to solve your problems, but it offers a few functions that i found very useful:

  • Compute the size of folders and subfolders.
  • Rename files according to their metadata. To me this is key for organizing photos and videos from old phones, because they usually have nondescript filenames. I rename them using the time they were taken using a YYMMDD-HHMMSS format (e.g. C0036.MP4 becomes 091219-163120.MP4). It allows me to group and sort them easily afterwards.
  • Identify duplicate files, even when they have different filenames (using a slow but very reliable md5 hash method). Be careful to understand how to delete the duplicates though.

Possibly adobe Lightroom would do what you want...give the free trial a try

But does it handle video very well?

I remember I posted something similar years ago. Looking for a similar solutions, and multiple people offered solutions but one caught my eye. It was completely different than all of the solutions. It basically said let it all go. Imagine if all of the data was deleted, how would it effect your life? Stop trying to cling on to things. After that, I basically didn't do anything with the data and realized that I didn't gain anything and deleted it.

Dude, these are OP's personal memories!

Yeah no... That's not an option, if there was a fire the one thing I would grab is my hard drives full of irreplaceable memories

well that's extreme.

Once upon a time, had a junk drawer script that would iterate and sort per parameter; default create date, file format.

Buy a synology NAS and copy all content ovet? Run storage analyzer and Dedupe all duplicate.

For photos you can run a app that will rename all photo using exif date and time. This should eliminate duplicates photos because of the name. Then create years (2019, 2020....) folder and move all photos into each year

whats the app called that will rename everything?

I think it's called advanced file renamer https://www.reddit.com/r/photography/comments/5okk27/great_software_for_bulk_file_renaming/

Great, thanks

Many years ago i used a free app that let you filter images than where close to each other on an image basis, that is not by dize or weight. Totally recommend it to tackle your problem.

Also the same applies to other app I used to name mp3 qccording to their metadata

Possibly adobe Lightroom would do what you want...give the free trial a try

Possibly adobe Lightroom would do what you want...give the free trial a try

Folders

Load the drives into Neofinder and use that as your master catalog.

dates, sizes, types, names, any others im missing?

What a challenge. <marie kondo>I love mess!</marie kondo>

I assume you don't even know what kind of files you have. Although I'm not a proper data hoarder like many folks here, I'd give my two cents.

1- Get a bigger drive.

2- Try to put the HDDs in order.

3- Copy the contents of one HDD into the bigger drive.

4- Start sorting, in the biggest "chunks" you can find. Found a folder with personal pictures? Put it on the folder "Pictures". Found a DivX movie? Create an appropriate folder. Found something you're not sure about? Put it on a "Sort later" folder.

5- Put the HDD away, mark it with a post it or something. Time for the next one. Repeat the steps, creating new folders as you need it.

When you're done, you'll have easier chunks to sort out using different techniques. Personal photos can be sorted with some tool mentioned here (I'd check their EXIFs and maybe change their filenames to something like 2009-04-30_15-33-01.jpg, but that requires some command line knowledge and maybe you don't know how to do it). Music MP3 can be sorted with iTunes (I like it, ok?). Divide and conquer.

After that, go for the "Sort later" folder. Good luck with that.

If there's too much data, you can start sorting each drive contents trying to follow the same logic for each one, and then collecting all the same type of folders together.

I don't know all the time types I have but it's safe to assume I have every type of common video, picture, and music file.

I sort of started doing this process but got all tied up in knots and gave up pretty quickly

I don't know if it's your case, but semi-manual sorting can be needed because sorting by file type can destroy some relationships. Say you've got a folder with work stuff, that includes images, videos, audio files. They should be together, not among songs or family pictures. That's why I think you should first sort things like photos you want in a library, or movies to go to a Plex server, or songs to be sorted with some other media manager. Those files with a right destination should be easier to sort, and then you can concentrate in those files that need more attention.

I was thinking if there's some way of checking files metadata. Say you have a semi-sorted folder with pictures, and want to be sure that every picture inside it was taken with a camera of phone, so it can detect if there's some files that are images, but shouldn't be there.

Download duplicate cleaner free. Tell it which drives/folders to search, it will find all copies that have the same Metadata, and then you can tell it to delete all but one copy or other options.

You will still have folders in folders and empty folders but at least you won't have duplicates.

Combine everything in one folder. Delete the empty. Sort /folder from there and sort as you enter new data so you don't have to do it again in the future.

Maybe look at adding them to a NAS and running a few different docker containers to sort your files.

Don't worry about duplication too much.

1) Get a NAS, gather your data there. Divide into folders, use date-stamps as sub-folder names with description where possible.

2) Add online backup to NAS - for instance BackBlaze.

Or if you don't need a NAS - just add a big external drive somewhere if that is sufficient - but still use online back up too.

I do intend to get a NAS but they're pretty pricey. I'm looking for solutions for a diy one

Well, just a simple, big external drive is actually fine - if you combine it with online backup.

You may be interested in this related subreddit: r/datacurator.

Ah I've not come across that subreddit before so thanks for the

I've looked through the answers (97 comments by now, I've looked at many) and maybe its obvious, maybe not, but you did not say why you want to keep it, what for?

If just in case, data hoarding, provided a budget for drive space, maybe to have as it is?

If you want to save space, then removing duplicates will help, but just running software on whole AFAIK may likely result in losses of connections (that is set of files being together in a folder might loose some members).

For myself, I wanted to sort photos tagging them with several categories each like family, vacation, business trip, names of cities etc., but have not found software to quickly tag them even manually (I don't want to choose only one category i.e. putting in specific folder).

Also you asked for software, many answers advice hardware and how to bring all together, those answers are generally useful IMO if you need that step.

P.S. I'm going through similar process myself, it is a huge project for me, I move forward, become bored and tired of so many files to review, pause for some weeks/months, continue.

If you'll get good suggestions on software to automatically sort it out and it will work out for you, please write me. TIA

I want to keep it because I have a long history of building/ making things, art projects and diy. I'm trying to document them all. Also lots of childhood pictures and videos I would like to see again. Some of lost relatives.

So the data needs to be accessable and organized, ideally accessable for multiple people.

I'm realizing now that unless the pictures gave metadata on them then I'm not going to get any dates from them. But I might be able to narrow down the year.

My main issue is that there is so many layers of files, and backups within backups within backups.

I've had lots of really helpful information here so thanks to everyone, I'll try a few and see what's best for me.

From what you wrote I do not see you insist on multiple tagging of each picture. In such case just sorting them by folders would do. As for dates to help in sorting, do file modification dates mean nothing on your disks?

As for removing duplicates, I've looked at it some time ago and apps I've found could only do search of dups inside one location and also might require manual confirmation of deletions. So I've wrote python script for myself to delete dups in one location against another (keep sorted folder, delete dups in unsorted), I check optionally by date, checksum, etc. I'm still thinking about uploading it to e.g. github.

tagging would be nice but it seems like more work, and at this point I really don't need that. I there are dupes of backups and the dupes have different dates. so ideally i would delete the newer version.

your script sounds interesting but I'm not that great with python

If you're still working on this I call out Clonespy for duplicate processing.
It's an old but great duplicate detector for files of all sorts. It's inusual in that you assign two or more folders as 'pools' and then tell it what to do. Like delete duplicates but only from Pool 1. Or 'delete within pool 1, keep the one with newest date, or shortest path, or longest file name....
You can have it prompt you for each set of duplicates and manually choose which to kill, or have it create a batch script to do the deletion at the end, or just trust it and let it delete on the fly.
I've used it for years, and never had a problem.

How do you eat an elephant? One bite at a time. I just copied 25 years of audio and video to a new NAS I built for my studio, and luckily everything was labeled chronologically from the start (yyyy-mm-dd client-location). If you are I'm windows, the bonus is the "created" date column. Then organize by file type. Likely a long, ongoing process, but you'll thank yourself in the end

You can organize the pictures into a list with python then convert it to a set and it will remove duplicates

I have data from drives from the early 00’s. Use dupe guru, look for 100% matches and possibly find whole redundant file structures to delete. https://dupeguru.voltaicideas.net/. I’ve been chipping away at it for years, still not done. Also, trying to organize my data into priority tiers of backup importance.

video comparer - for finding dup videos
visually similar duplicate image finder - for finding dup images

I belive both have lifelong license options which if you have lots of data I would say is worth it.

much easier to organise from start, learnt that years ago.

Duplicate Files Fixer