grepping Facebook Posts

Forcing The Facebook to be less useless

grepping Facebook Posts
A terminal emulator on Debian (Photo Credit: LukasUnsplash License)

Facebook allows you to transfer content to external providers. For example, you can send photos to Google Photos, and posts to Google Drive or Dropbox. (These are just two examples; there are many others.)

To do this, visit Transfer a copy of your information and choose the appropriate options.

Possible Use Case

Suppose you want to back up local, readable copies of your Facebook posts, in order to search the text for specific words or to reuse the content of posts for another purpose.

  1. Transfer everything to Google Docs using Facebook’s transfer tool.
  2. This will export all posts to docx format in a folder in your Google Drive.
  3. Next, use rclone to sync the Google Drive folder to your local drive:
    rclone sync gdrive:/my_facebook_posts /home/tomk/Documents/my_facebook_posts/
  4. Now you have a collection of docx files, which is just a zipped collection of XML files. The default filenames are in the format  'Facebook Post:2014-01-06T09:25:58.docx'.
  5. Next, you can change to the directory containing the docx files:
    cd /home/tomk/Documents/my_facebook_posts/
  6. Then, unzip each file and extract the text:
    for f in "Face*";do unzip -p "$f" word/document.xml | sed -s 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g' |less;done
  7. The above command dumps the text to stdout, but you can redirect it to a file:
    for f in "Face*";do unzip -p "$f" word/document.xml | sed -s 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g' > all_posts.txt;done
  8. The command above is rough: I would like to refine it to extract each post to its own file, extract all text to a file by year, etc.

Found this post insightful, funny, or useful? Consider supporting us.

Support Us