I recently came across a situation where one of our Umbraco editors had to do some work for a client. The client had some financial information contained within PDF files that were loaded into the media folder of their Umbraco site. This information had become outdated or linked to newsletters that were no longer relevant or even on the site.
The client had stumbled across the documents while doing a Google search on their site and asked our editor to remove the items, which the editor promptly did. However, a few weeks later the client did the same Google search and found the documents still available. Eeek! Needless to say the client wasn’t happy and we looked foolish.
It turns out that while our editor did a straight forward file delete and the link was gone, the associated file was still languishing in the media folder. This happened because Umbraco, effectively, does a soft delete and puts the media item in the “Recycle Bin”. The associated media files are only actually deleted when the “Recycle Bin” is emptied (or the specific item is emptied from the “Recycle Bin”).
Once we realised this, we were able to remove the offending files and they were no longer accessible from Google (Google still held them in it’s index for a few days until it crawled the site again).
The moral of the story is “empty your Umbraco Recycle Bin” when you’re deleting.
My other post about duplicate hostnames is also a symptom of this.