warning
Sorry! This post was recently imported from another system and has not yet been edited to work correctly with this website. The post may look bad, its "published" and "edited" dates may be incorrect, and it may be missing some or all of its images. We're very sorry for the issues and we'll have things back in order very soon!
warning
Warning: the information in this post may be outdated.
We made a new post that has more up-to-date information, click the button below to check it out.
Updated post

How I Back Up My Data

This is a basically a guide on how to hijack all of my data.

By
Jacob Marciniec
A computer window with a directory named pumpkin-memes and a cursor hovering over the directory with a dialog menu that only shows the option to copy the directory.
You must protect your pumpkin memes at all costs.
Graphic created by Jacob Marciniec, directory icon from Google's "Material design icons"
Graphic created by Jacob Marciniec, directory icon from Google's "Material design icons"
(original modified)
Sorry, I don't have a feature image for this post yet!
Image from
DavidRockDesign on Pixabay
(original modified)
Published:
October 25, 2020 16:31
Published:
October 31, 2020
Last edited:
November 1, 2020
Last edited:

Hi.

After suffering complete losses of all my documents, photos, etc. multiple times... I now spend lots of time thinking about how to best keep backups of all my data.

It's kind of become a hobby of mine.

I have a huge and constantly growing archive of photos and videos that currently takes up over 2 TB. I expect this to grow to over 10 TB in the next 1–3 years and only continue growing from there.

Keeping all this data safe and easily accessible is no small task, but I think I do it well.

This month, I compiled all of the practices that I currently employ into a single document.

My data storage workflow protects against just about any "doomsday" scenerio you can imagine. These scenarios fit into 2 broad categories:

1. Losing all of my own personal data storage devices (storage servers, external hard drives, etc.) through e.g.

  • Some evil person physically stealing all of my data storage devices.
  • All of my data storage devices simultaneously failing irrecoverably.
  • A disaster (house fire, flood, etc. — knock on wood) destroying all of my data storage devices.

2. Losing all of the data that I store online in cloud services through e.g.

  • Some evil person "hacking" into any of my online devices and deleting all of my data.
  • Some evil person "hacking" into any of the cloud providers storing my data and deleting all of my data.
  • Any or all of the cloud providers storing my data randomly disappearing entirely off the face of the Earth without a trace.

As long as 2 of these things (1 from each category) doesn't happen at essentially the same time, I will not lose any significant amount of my data.

TL;DR

Long story short: I keep a copy of all my data in 3 distinctly different places. I can completely lose 2 full copies of all my data and still be able to recover all my data from the remaining location in a short period of time.

I think this document will be useful for:

  • myself to refer back to,
  • an evil person who wants to sabotage all my data,
  • anybody who is serious about backing up their data but isn't sure how to, and
  • people who will work with me in the future and need to access/work with my data.

I will probably change and/or improve this document with time, but I will keep this version here, as is, unchanged for archival purposes.

I hope you find it helpful. ♥

Jacob Marciniec's Data Storage Policy

How my data is created and backed up on a regular basis.

Important data storage and transfer notes

  • Everything is a file. All data kept for back-up purposes should be stored in files of standard, easily-manipulatable formats (CSV, JSON, DOCX, etc.)

  • Never make copies of copies. When copying files to multiple locations, always copy the original source files to each destination individually.

  • Always verify that source and destination directories are identical. After manually copying files to new locations, run the diff command in a Linux terminal to verify that the source and destination files are the same. Example of this command:

    diff -r path/to/source/directory path/to/destination/directory
  • Device backups should always be kept in a single-file archive (e.g. 7Z, TGZ or ZIP; compression is neither necessary nor preferred)


Locations of all important data

All important data is essentially kept in 3 physical locations:

  1. Local online storage — drives in a local NAS server (SOON TO BE ADDED)
  2. Cloud online storage — drives on 3rd-party servers (currently Google Drive)
  3. Emergency offline storage — stand-alone drives in external enclosures

Data storage location notes

  • If data is not in any of the locations listed above, it is expendable. It's dead to me. It does not exist.
  • Local online storage and cloud online storage should be treated as a single location. They should always automatically sync with each other.
  • These are the 3 main physical locations where we store data, but all data exists in 1 logical location. I.e. ideally, each location is just an exact copy of the same data at all times.
  • ... but this system is not ideal. Some data needs to be manually synced on a regular basis (the manual syncing process is covered later).

Logical organization of data

Logically, data is organized into 3 directories:

  1. "Workspace" — files that are small and/or referred to often, e.g.
    • spreadsheets,
    • video project files,
    • video scripts,
    • product manuals,
    • etc.
  2. "Sharing" — files that need to be shared with outsiders,
    • client projects,
    • files for collaborations,
    • documents for accountants,
    • etc.
  3. "Storage" — a huge, ever-expanding archive of space-hogging files that do not necessarily need quick and easy access, e.g.
    • raw and/or unedited photos,
    • raw and/or uncut videos,
    • device backups,
    • service backups,
    • etc.

Data that requires manual syncing

The following data needs to be manually synced on a regular basis:

  • From local machine to online storage and emergency offline storage.
    • On my local machine for daily use, I keep one encrypted directory which contains small, high-security files. This is the only directory that needs to be synced manually. This directory should ONLY be opened locally and NEVER touch a cloud server in an unencrypted form.
  • From 3rd party services to online storage and emergency offline storage.
    • Website and blog content.
    • Contacts from:
      • Google Contacts
      • MailChimp
    • Notes from Google Keep.
  • From 3rd party services to emergency offline storage.
    • Passwords from 1Password.

Data storage workflow

Regular files created/edited/downloaded

Most files should be created and edited on, downloaded to, and accessed from local online storage, i.e. the local NAS server.

  • These files are automatically synced with cloud storage.
  • These files are manually backed up on a regular basis to emergency offline storage.

Files for sharing with clients/collaborators

Shared files should be created and edited on, downloaded to, and accessed from online cloud storage (currently Google Drive), i.e. the same place the parties the files are shared with access them.

  • These files are automatically synced with local online storage.
  • These files are manually backed up on a regular basis to emergency offline storage.

Video/image files from cameras

Still image and video files from shoots are saved to memory cards in cameras.

  • These files should be immediately uploaded after each shoot to local online storage and emergency offline storage.

High-security files

High-security files are put in a single, "high-security" directory on the local machine. This directory should ONLY be opened locally and NEVER touch a cloud server in an unencrypted form.

  • These files are manually backed-up on a regular basis to online storage and emergency offline storage drives.

Temporary files and copies of media

Anything kept on a local machine but not in the high-security directory is considered a temporary file.

Temporary files include:

  • copies of video files used to edit together a video on the local machine
  • a product manual that was downloaded in order to be read one time
  • a screenshot that will not be needed in the future
  • a program installer
  • etc.

Temporary files are regularly deleted.

If something should not be deleted, it must be put in the high-security directory or on the local NAS.

---

Regular manual back up procedure

This is the standard back up procedure that should be done on a regular basis (no less than once a month). It syncs all files and data that is not automatically synced.

This procedure assumes you are using Windows 10, since unfortunately, that's what we're using right now.

For each item in this procedure, create a folder.

The folder should have the following information:

  • day's date
  • broad name of data/service
  • specific descriptor of data

... in that order, separated by underscores.

Only use dashes (-), underscores (_), and alphanumeric characters (A–z, 0–9) in file names.

Examples of folder names:

  • 2020-10-26_Mailchimp_contacts-exports
  • 2020-10-26_Google_full-takeout
  • 2020-10-26_Jacobs-iPhone-SE_full-backup
  • 2020-10-26_jacobmarciniec-com_full-website-snapshot

1. High-security files on local machine

Back-up the high-security directory.

  1. Right-click on the high-security directory.
  2. Click "7-Zip".
  3. Click "add to archive".
  4. Apply these key settings:
    • compression level = store (no compression)
    • enter a password for encryption
    • encryption method = AES-256

Copy the archive that is created to online storage and emergency offline storage.

2. Workspace directory

Back-up the workspace directory.

  1. Right-click on the workspace directory.
  2. Click "7-Zip".
  3. Click "add to archive".
  4. Apply these key settings:
    • compression level = store (no compression)

Copy the archive that is created to online storage and emergency offline storage.

3. Sharing directory

Back-up the sharing directory.

  1. Right-click on the sharing directory.
  2. Click "7-Zip".
  3. Click "add to archive".
  4. Apply these key settings:
    • compression level = store (no compression)

Copy the archive that is created to online storage and emergency offline storage.

4. Contacts

Export contacts from the following services according to the instructions provided by the respective services:

  1. Google Contacts
  2. MailChimp

Save the plaintext structured data files (CSV, JSON, whatever) that are created to online storage and emergency offline storage.

5. Google account

Do a Google "Takeout". Search Google for "Google Takeout". The process should be pretty easy.

Generally, we want to export everything except for data from a few, key Google services that is already backed-up.

Be especially sure to export:

  • Google Keep notes
  • Google Calendar
  • Google Contacts
  • Google Mail (Gmail)

... but do not export:

  • Google Photos
  • Google Drive
  • YouTube videos

... these libraries are huge and already backed up.

Save all the files created by Google Takeout to online storage and emergency offline storage.

6. Website

Back-up the most important parts of our website(s). Repeat the following steps for each website.

Currently, these are our websites:

  • jacobmarciniec.com

Log into Webflow account and export all data from every "collection" using the Webflow CMS.

Download a full "snapshot" of the website (i.e. create a local copy of every webpage on the website and all the resources needed to properly display every webpage) from a Linux terminal with the following commands.

# get sitemap
wget -O sitemap.xml --compression=gzip https://www.jacobmarciniec.com/sitemap.xml
# extract all URLs from sitemap
grep -Po 'http(s?)://[^ \"()\<>]*' sitemap.xml > sitemap.csv
# download all pages and requisites
wget --span-hosts --page-requisites --convert-links --backup-converted --wait=2 --random-wait --limit-rate=25k --no-cookies -e robots=off --compression=gzip -i sitemap.csv --rejected-log=wget-log.txt

wget command explained:

  • --span-hosts - allow downloading resources from any domain
  • --page-requisites - download all resources needed to display a page
  • --convert-links - after download, convert links for local viewing
  • --backup-converted - backup original files before converting links
  • --wait=2 - wait 2 seconds between downloads
  • --random-wait - wait a random amount of time 1–2x of that given
  • --limit-rate=25k - limit download speed to 25KB/s
  • --no-cookies - disable cookies
  • -e robots=off - ignore robots.txt and "nofollow" links
  • --compression=gzip - request compressed files and decompress them
  • -i sitemap.csv - download all pages listed in this file
  • --rejected-log=wget-log.txt - log all failed downloads

Put the "snapshot" into an archive.

  1. Put all the downloaded files into their one directory.
  2. Right-click on the directory.
  3. Click "7-Zip".
  4. Click "add to archive".
  5. Apply these key settings:
    • compression level = store (no compression)

Save the archive to online storage and emergency offline storage.

7. Passwords

Back-up all passwords.

  1. Export all passwords from all vaults in 1Password (follow the instructions given by 1Password).
  2. If multiple files were created by 1Password, put them all into a single directory.
  3. Right-click on the directory.
  4. Click "7-Zip".
  5. Click "add to archive".
  6. Apply these key settings:
    • compression level = store (no compression)
    • enter a password for encryption
    • encryption method = AES-256

Copy the archive that is created to emergency offline storage ONLY.


Thank you for following the proper data storage procedures.

About Jacob

I'm Jacob! I'm the guy this website is named after. No wait... I'm just the guy who made this website. Anyway, I like sharing my wisdom and I'm documenting my life for historical accuracy (because I think I'm going to be rich and successful one day).

Comment

Read More

Oops! Something went wrong while submitting the form.