File attachments pruned - let me know if you see missing images.

neptronix

Administrator
Staff member
Joined
Jun 15, 2010
Messages
17,528
Location
Utah, USA
Hey all. I'm preparing to do a real big server upgrade and move for ES.

In PHPbb, there's an option to 'delete orphaned attachments'. I run this once a year to keep our file storage down to a rational size.
I suspect this feature is imperfect and deletes images that are in actual posts.

The era of posts affected by this is 2019-2021.
Let me know if you suddenly see something missing. IE a section of a thread's images have gone byebye.

I will be keeping a copy of the images from before this change for at least 6 months, so if we have issues, this is revertable.
 
That's thumbnail corruption in that thread unfortunately ( on our todos is unassign all the thumbnails, since the functionality in phpbb is bad - it doesn't accurately create or maintain them. )

You can click on the broken image and get the original, so that's not this issue i'm looking for.

If an image is truly missing, you can click on it and see no trace.
 
Image 1 is broken:
file.php


Image 2 works:
file.php


I only clicked image 2, my bad.

Anyway, did both of these things just happen at the same time, or separately?
 
I finally sort of got to the bottom of this by disabling thumbnails and an hour of investigation.

The one file still missing from your post isn't on the disk anymore.. :roll:
You may have been unlucky enough to have uploaded it during one of the episodes of the disk being full.

Apparently phpbb writes the database entry for the file, then writes the file, then doesn't check if the file write was successful. it also doesn't check if the thumbnail created correctly. So if you had a transient database or server issue,
That explains a lot - any time you have a system fault, you get weird corruption like this.

Well anyway.. 38761 attachments had their thumbnails turned off and it is statistically more likely that the thumbnail will be missing than the images, so this means some images will be restored at least.
 
Another reason for the files missing is that when i've had to restore from backup, i cannot copy the folder because it has too many files in one folder ( >250,000 ). So i often move the entire folder from backup. The backups also occasionally don't copy a file or two in our huge images folder. So there is a tiny amount of data loss from this. (~0.00001%).

I don't know if Xenforo has this low fault tolerance when it comes to uploading images. I am thinking that to maximize data integrity and be prudent, posting and uploading once we move to xenforo should be disabled when file and database backups are ran.

Xenforo also has options to automatically locally grab images that are uploaded and self host them for the sake of preservation. This will prevent another era of content from going missing.

Will make sure this issue halts in the near future.
 
...yes, 99% of the images we've lost are due to them being hosted elsewhere.

Xenforo is extremely forward thinking compared to phpbb by having features like this out of the box... and many more.
 
I have some after the fact analysis on this after looking at phpbb's code.

phpbb has a feature to prune unused images, but is not fully accurate at establishing what was in the database and what was not..
Therefore it would occasionally sacrifice a file for no percievable reason.

In an attempt to save CPU power, phpbb does not fully search the database for signs of a file/reference to a file in a post mismatch.
It uses it's own markers which is where the problem starts.

So the reason we've lost a certain % of our images over the ages is because of use of this function.
The whole reason that function exists is because phpbb doesn't always properly remove or add post images.
Basically the code base is flakey as hell when it comes to writing images with high integrity.

Lesson learned the hard way :|


We now have better ways to manage image storage ( i wrote ~700 lines of code that automatically optimizes our image store ) and will also check out xenforo's code for similar defects. Because we got a >50% reduction of our file storage through that.

We're hoping the the xenforo platform has much higher data integrity. if it does not, i plan to develop some aftermarket tools to 'coax' it into behaving well. It may be that we only lost ~3% of our images to this feature, but that's still too much.
 
neptronix said:
Hey all. I'm preparing to do a real big server upgrade and move for ES.

In PHPbb, there's an option to 'delete orphaned attachments'. I run this once a year to keep our file storage down to a rational size.
I suspect this feature is imperfect and deletes images that are in actual posts.

The era of posts affected by this is 2019-2021.
Let me know if you suddenly see something missing. IE a section of a thread's images have gone byebye.

I will be keeping a copy of the images from before this change for at least 6 months, so if we have issues, this is revertable.

Just ran across this since i've been mostly out of the world for a long time and still catching up.

Ghere are quite a few images missing, probably all from that time period though i don't know for sure, in various threads of mine (primarily the long ones like the housefire thread and the sb cruiser and crazybike2 threads). But some were missing before the date of yourpost, and i don't have any way to tell which were missing before and which are only missing now. Do you have any suggestions?
 
I don't have a good way to find out, it means comparing two sets of 7gb of text data, and many lines of code, and a long testing process. And only lately did i get the kind of power to run such a comparison. Writing the code for an integrity check is a PITA and i'd only use it one time.

Image corruption in phpbb is endemic in the platform and i've discovered a couple functions and program designs that were responsible for a good % of our loss over time. Our total image loss ( that i know of ) is 3% of the forum's content, moreso in older content than newer.

It's unfortunate but not worth fixing, however when we are on xenforo, i do plan to write an image integrity checker to spot anything like this happening in the early phase, because we do not get timely user reports here and usually we are way out of the window of backup time to do forensics on the issue.
 
Back
Top