Automatically Publishing Static Site Blog Articles to a Facebook Page

25 Jan 2025 - tsp
Last update 25 Jan 2025
Reading time 19 mins

Manually publishing blog articles to a Facebook Page can quickly become a tedious and time-consuming process, especially when managing multiple social media platforms simultaneously. Automation offers a solution by streamlining workflows, reducing the likelihood of errors, and ensuring a consistent online presence. Additionally, automation significantly decreases the manual effort required, allowing you to focus on more strategic tasks. This approach uses a static site generator, the Facebook API, and a web server to handle authentication and token storage, creating an efficient and seamless connection between your blog and its audience on Facebook.

Static site generators, when compared to traditional content management systems (CMS), provide substantial advantages. They enable a simplified frontend setup, drastically reducing the attack surface and improving efficiency and scalability. This translates to lower operational costs and increased reliability. Furthermore, static sites excel in leveraging caching mechanisms such as CDNs or services like Amazon S3, optimizing performance for high-traffic websites. Unlike traditional CMS solutions that rely on dynamic scripting for every request, static sites focus on delivering prebuilt files, alleviating the server’s workload.

When combined with automated build systems and version control tools like Jenkins and Git, static site generators retain the flexibility of dynamic content management. Authors can write their articles offline using straightforward format like Markdown or graphical editors and then push the content to version control. Workflows can incorporate editorial review processes akin to those used in software development. Automated systems periodically generate and deploy updated sites through tools like rsync or cloud storage solutions, ensuring a smooth and robust publishing cycle.

Automating the social media distribution of blog articles enhances this process further by saving time, maintaining consistent updates, and increasing audience engagement through timely publications. It also frees up resources, enabling teams to focus on more valuable activities. In short, automating repetitive tasks is a cornerstone of operational efficiency, and implementing this solution for blog publishing is a step toward smarter and more sustainable content management.

Requirements
Part 1: WSGI Script for Facebook Login and Token Management
Part 2: The actual Publishing Script for Automated Blog Article Posting
Summary

Requirements

An automated publishing pipeline requires a few key components. First, you need an existing and functioning static website generated by a tool such as Jekyll or Hugo. This static site serves as the foundation for your blog content. To integrate with Facebook, you’ll need Python installed on your system, along with the necessary libraries like requests and BeautifulSoup, which can be installed via pip.

$ pip install BeautifulSoup requests feedparser

Additionally, a Facebook account is required, along with administrative access to the Facebook Page you intend to manage. For hosting the authentication and token management workflow, a WSGI-capable server is essential. While Apache with mod_wsgi is a common choice, you can use any WSGI server. Alternatively, you could opt for a self-hosted setup and redirect the OAuth flow to this server on your local machine, depending on your environment and needs. This flexibility allows you to adapt the system to your existing infrastructure, ensuring seamless integration.

All Facebook API call callbacks must be directed to HTTPS-secured endpoints to maintain a secure communication channel. Additionally, the token obtained through the API provides access to your Facebook page and must be treated as highly confidential. Ensure that API credentials are stored securely, preventing the webserver from exposing them to end users while allowing authorized scripts to access them. Always verify that the required permissions are correctly configured for seamless integration.

Let’s begin with the first component - if you don’t want to run with a WSGI script skip to the end of this section for a hack for manual single deployment: the Login and Token Management script. Accessing Facebook’s API requires a token, which serves as an authentication mechanism. Your application is identified by an APP_ID and an APP_SECRET, both of which are generated through the Facebook Developer Console (https://developers.facebook.com/). To initiate user login and obtain a short-lived (approximately one hour) access token, users are redirected to Facebook’s OAuth endpoint. During this process, you must request specific OAuth permissions to ensure the application can perform the required actions seamlessly. These permissions include:

business_management: This allows the app to manage business assets, such as pages and ad accounts, which is essential for maintaining administrative control over the connected page.
pages_manage_posts: This permission enables the app to create, edit, and delete posts on the Facebook page, making it a core requirement for automated article publishing.
pages_read_engagement: This grants the app access to insights and metrics about posts and audience engagement on the page, helping to analyze the impact of published content.
pages_show_list: This allows the app to retrieve a list of Facebook pages that the user has access to, ensuring it can identify and interact with the correct page.

These permissions ensure that the app can operate efficiently while adhering to Facebook’s access control policies. It is important to request only the permissions necessary for the app’s functionality to maintain security and comply with Facebook’s guidelines.

Since short-lived tokens are unsuitable for background services, the next step involves exchanging this token for a long-term user token that does not expire for publishing purposes. For user information access, these tokens have a three-month expiry, but this does not impact our scripts as they do not require user-specific data.

Create the Facebook Application

The first step is to create a new Facebook App. When administering a page, select “Business” as the app type rather than “Consumer” and define its category as “Other.” Then, configure basic settings, including a display name, app domains (your own domains and subdomains), a contact email, and a privacy policy URI (this is mandatory for the app to go live). Under the Advanced settings, you can specify which web servers are permitted to access the Facebook API for your application. Be sure to copy your APP_ID and APP_SECRET for future use.

Next, use the “Add Product” feature in the Developer Console to select “Facebook Login for Business.” In the settings, define the redirect URI. Then navigate to the configurations of Facebook Login and create a new setup using the Page Management template. Once added, edit the Page Management template to include the necessary permissions: business_management, pages_manage_posts, pages_read_engagement, and pages_show_list. This completes the app’s configuration.

While your Facebook App remains in development mode, posts will only be visible to you and other test roles. This enables posting during development without affecting the live environment. API limits still apply of course.

While developing you can test the requests we are going to do using Facebok Graph Explorer at https://developers.facebook.com/tools/explorer/ and the Access Token Debugger at https://developers.facebook.com/tools/debug/accesstoken/

The WSGI server configuration

To integrate the WSGI script with Apache, you need to configure the VirtualHost as shown below:

WSGIScriptAlias /fbpageapi /path/to/your/wsgi/fbapipageapi.py
WSGIScriptAlias /fbpageapi/callback /path/to/your/wsgi/fbapipageapi.py
WSGIDaemonProcess yourdomain user=your_user group=your_group processes=1 threads=1
WSGIProcessGroup yourdomain

In addition, the mod_wsgi module must be loaded into Apache. This is done by adding the following line to the httpd.conf configuration file:

LoadModule wsgi_module /usr/local/libexec/apache24/mod_wsgi.so

This setup ensures that the WSGI script is properly served by Apache, providing the necessary environment for handling Facebook authentication and token management workflows.

The workflow

The process begins when a user accesses our designated URI. In this straightforward implementation, users are automatically redirected to Facebook’s OAuth endpoint without the need for form inputs. This immediate redirection simplifies the user journey and ensures an efficient and user-friendly authentication process.

Upon successful login, Facebook redirects the user back to our callback URI. At this stage, the short-lived token provided by Facebook is exchanged for a long-lived token, which is then securely stored in a file for future use. It is critical to note that the callback must be served over HTTPS to meet Facebook’s security requirements and ensure a secure token exchange process.

The script

Let’s implement the script that facilitates the Facebook login and token management workflow. This script redirects the user’s browser to Facebook’s login form, where they are prompted to grant the following permissions: business_management, pages_manage_posts, pages_read_engagement, and pages_show_list. Upon the user’s approval, an access token is obtained and exchanged for a long-lived access token. This token is then stored securely in a token.dat file. For simplicity, static configurations are used within the script. The script can be deployed via mod_wsgi or executed as a standalone application.

from wsgiref.simple_server import make_server
import requests
from urllib.parse import urlencode

# Application credentials and configuration
APP_ID = "XXXXXXXXXXXXX"  # Your Facebook App ID
APP_SECRET = "XXXXXXXXXXXXXXXXXXXXXX"  # Your Facebook App Secret
REDIRECT_URI = "https://yourdomain.com/fbpageapi/callback"  # Your callback URL
TOKEN_FILE = "/path/to/your/token.dat"  # File to store the access token

def application(environ, start_response):
    """WSGI application handling Facebook OAuth authentication and token management."""
    path = environ['PATH_INFO']
    query = environ['QUERY_STRING']
    params = dict(q.split('=') for q in query.split('&') if '=' in q)

    if path == "" or path == "/":
        # Step 1: Redirect user to Facebook's OAuth login page
        fb_url = f"https://www.facebook.com/v21.0/dialog/oauth?{urlencode({
            'client_id': APP_ID,
            'redirect_uri': REDIRECT_URI,
            'scope': 'business_management,pages_manage_posts,pages_read_engagement,pages_show_list'
        })}"
        start_response('302 Found', [('Location', fb_url)])
        return []

    elif path == "/callback":
        # Step 2: Handle Facebook callback and exchange code for token
        code = params.get('code')
        if code:
            # Exchange authorization code for short-lived token
            token_url = f"https://graph.facebook.com/v21.0/oauth/access_token?{urlencode({
                'client_id': APP_ID,
                'redirect_uri': REDIRECT_URI,
                'client_secret': APP_SECRET,
                'code': code
            })}"
            token_response = requests.get(token_url).json()
            access_token = token_response.get('access_token')

            if access_token:
                # Step 3: Exchange short-lived token for long-lived token
                long_token_url = f"https://graph.facebook.com/v21.0/oauth/access_token?{urlencode({
                    'grant_type': 'fb_exchange_token',
                    'client_id': APP_ID,
                    'client_secret': APP_SECRET,
                    'fb_exchange_token': access_token
                })}"
                long_token_response = requests.get(long_token_url).json()
                long_lived_token = long_token_response.get('access_token')

                if long_lived_token:
                    # Step 4: Store the long-lived token securely
                    with open(TOKEN_FILE, 'w') as token_file:
                        token_file.write(long_lived_token)
                    start_response('200 OK', [('Content-Type', 'text/plain')])
                    return [b"Long-lived token successfully fetched and stored."]

        # Error handling for token exchange failures
        start_response('400 Bad Request', [('Content-Type', 'text/plain')])
        return [b"Failed to fetch token."]

    # Handle undefined paths
    start_response('404 Not Found', [('Content-Type', 'text/plain')])
    return [(f"Not found: {path}").encode('utf-8')]

if __name__ == "__main__":
    # Run the WSGI application as a standalone server for testing
    httpd = make_server('', 8051, application)
    print("Serving on port 8051...")
    httpd.serve_forever()

Keep in mind that this script permits any user to log in and potentially overwrite the stored token. To prevent unauthorized access, it is essential to secure this script with measures such as HTTP authentication. Properly restricting access ensures the integrity and confidentiality of the token management process.

Explaination

Redirect to OAuth Login:
- The / path redirects users to Facebook’s OAuth login page with the required permissions.
- These permissions allow the app to manage and interact with Facebook pages.
Token Exchange:
- The /callback path handles the response from Facebook and exchanges the authorization code for a short-lived token.
- It then exchanges this short-lived token for a long-lived token suitable for background operations.
Token Storage:
- The long-lived token is securely stored in a file to allow automated workflows without requiring repeated authentication.
Deployment Options:
- Deploy the script using mod_wsgi for production or run it as a standalone server for development and testing purposes.

This implementation ensures secure and efficient management of Facebook OAuth tokens for page management tasks.

A hack for testing or hobby pages without WSGI script

In case you dont want to use an exposed WSGI script you can also utilize Graph Explorer to fetch a short lived user token. Then you can utilize the Access Token Debugger to convert the token to a long lived one. This token can then be stored in the token.dat file manually. This is of course an hackisch approach but enough for a hobby platform - and it reduces the attack surface even more. You still have to create the Facebook application as described in the previous section though.

Part 2: The actual Publishing Script for Automated Blog Article Posting

The purpose of the publishing script is to automate the posting of blog articles from a static website to a Facebook page. The script operates by integrating metadata extraction, file tracking, and interaction with the Facebook Graph API. Here’s how it works:

Overview of the Workflow

The publishing script begins by loading our long-lived user token stored from a file on our machine. This token enables access to the Facebook page and allows for content creation. It then loads an internal JSON-based database that keeps track of published posts and their metadata. This database is used to identify whether a post is new or has been updated since the last run. A JSON based “database” is useful for small scale deployments, one can simply replace it later on with a real database when the page grows.

The script fetches a list of Facebook pages that the user has access to and verifies that the page configured in the script is accessible. This includes ensuring that the user has the necessary privileges, such as CREATE_CONTENT, to publish content to the page.

Once these preliminary checks are complete, the script iterates over a predefined set of directories containing rendered HTML files. For each file, it extracts metadata using Open Graph (OG) meta tags, calculates a unique hash of the metadata, and compares it with the stored hash in the database. If the hash is new or differs from the stored value, the file is flagged for posting or updating. The script also limits the number of posts it publishes per run. By setting a maximum cap on new posts, the script ensures that a larger backlog of content can be processed gradually without overwhelming the audience or appearing spammy. This measured approach helps to maintain consistent exposure and engagement with the articles.

Key Features and Functions

Database Management

The script uses a JSON file (published.json) as a lightweight database. It loads this database at the start of the script and updates it whenever a new post is published or an existing post is modified. Functions include:

database_prepare: Loads the JSON database into memory.
database_check_present: Checks if a file has already been processed.
database_check_dirty: Determines if a file’s metadata has changed.
database_update: Updates the database with the latest metadata hash.

Metadata Extraction

Using BeautifulSoup, the script parses each HTML file to extract metadata such as the title, description, image, and URL from OG meta tags. If any metadata is missing, the script attempts to infer or construct the necessary information to ensure the post is complete.

Facebook Interaction

The script interacts with the Facebook Graph API to publish or update posts. Key functions include:

facebook_post: Posts new content to the Facebook page. If applicable and enabled, it backdates the post using a timestamp inferred from the file’s modification time or directory structure.
facebook_update: Updates an existing post with modified metadata.

File Processing

The script processes files in the following steps:

Identify HTML files in the specified directories that should be watched
Extract metadata from each file.
Calculate a hash of the metadata and check against the database. One can later truncate this when the number of files gets large so that calculating metadata of all files gets a larger load. Then one can also watch file modification times since last calculation of metadata hashes.
Post new content or update existing posts based on the comparison.

The implementation

#!/usr/local/bin/python3.9

import requests
import feedparser
import os
import hashlib
import json

from datetime import datetime, timedelta
from bs4 import BeautifulSoup

PAGE_ID = "XXXXXXXXXXXXXX" # Set your page ID here (either title or real ID are sufficient)

TOKEN_FILE = "/path/to/token.dat"
DBFILE = "/path/to/published.json"
BASEDIR_WWW = "/usr/www/www.example.com/www/" # Watchdirs are relative to the base dir

MAX_NEW_PER_RUN = 1 # This limits the maximum number of published posts per run.

GRAPH_API_URL = "https://graph.facebook.com/v21.0"

BACKDATE_THRESHOLD=datetime.now() - timedelta(days=2)
EARLIEST_DATE=datetime.strptime("2025-01-12 22:45:42", '%Y-%m-%d %H:%M:%S') # We cannot backdate before our page creation unfortunately
BACKDATE_ENABLE=True

WATCHDIRS = [ # Only files inside those subdirectories are watched and published
    "math",
    "2022",
    "2023",
    "2024",
]

# A hack'ish JSON file "database"

database = {}

def database_prepare():
    global database

    try:
        with open(DBFILE, 'r') as infile:
            database = json.load(infile)
    except Exception as e:
        print(f"No database available: {e}")

def database_shutdown():
    global database

    pass # We already write on update ...

def database_check_present(filename):
    global database

    if filename in database:
        return True
    else:
        return False

def database_check_dirty(filename, hashval):
    global database

    if filename not in database:
        return True
    if database[filename] == hashval:
        return False
    else:
        return True

def database_update(filename, hashval):
    global database

    database[filename] = hashval

    # Note to stay consistent we write out to file 
    # after each update ...
    with open(DBFILE, 'w') as outfile:
        json.dump(database, outfile)

def get_backdate(fname):
    # Get folder name date if it has the given structure
    folder_date = None

    parts = fname.split(os.sep)  # Split path into parts
    for i in range(len(parts) - 2):  # Check for YYYY/MM/DD pattern
        try:
            folder_date = datetime.strptime("/".join(parts[i:i+3]), "%Y/%m/%d")
        except ValueError:
            continue

    mod_time = datetime.fromtimestamp(os.path.getmtime(fname))

    if folder_date and (folder_date < BACKDATE_THRESHOLD) and (folder_date > EARLIEST_DATE) and BACKDATE_ENABLE:
        return folder_date
    elif (mod_time < BACKDATE_THRESHOLD) and (mod_time > EARLIEST_DATE) and BACKDATE_ENABLE:
        return mod_time

    return None

def facebook_post(meta, token, page, backdate = None):
    data = {
        "message": f"{meta['og:title']}\n\n{meta['og:description']}", # \nRead more: {meta['og:url']}",
        "link": meta["og:url"],
        "access_token" : page['access_token']
    }

    if backdate is not None:
        data["backdated_time"] = backdate.isoformat()
        data["backdated_time_granularity"] = "day"

    response = requests.post(f"{GRAPH_API_URL}/{PAGE_ID}/feed", data=data)
    if response.status_code == 200:
        print(f"Posted: {meta['og:title']}")
        return response.json().get("id")
    else:
        print(f"Failed to post {meta['og:title']}")
        print(f"{response.json()}")
        return False

def facebook_update(meta, token, postid, page):
    data = {
        "message": f"{meta['og:title']}\n\n{meta['og:description']}", #\nRead more: {meta['og:url']}",
        "link": meta["og:url"],
        "access_token" : page['access_token']
    }
    if "og:image" in meta:
        data["picture"] = meta["og:image"]
    elif "og:image:secure_url" in meta:
        data["picture"] = meta["og:image:secure_url"]

    response = requests.post(f"{GRAPH_API_URL}/{postid}")

    if response.status_code == 200:
        print(f"Updated: {meta['og:title']}")
        return True
    else:
        print("Filed to update {meta['og:title']}")
        return False




def process_file(fname, token, page):
    # Fetch metadata ...
    with open(fname, 'r', encoding='utf-8') as f:
        soup = BeautifulSoup(f, 'html.parser')

    meta = {}
    for tag in [ "og:title", "og:description", "og:image", "og:url"]:
        meta[tag] = soup.find("meta", property=tag)
        if meta[tag]:
            meta[tag] = meta[tag].get("content", "").strip()
    if not all(meta.get(tag) for tag in ["og:title", "og:description", "og:url"]):
        return False

    print(f"{meta['og:title']}")

    # Calculate hash
    meta_hash = hashlib.sha256(json.dumps(meta, sort_keys=True).encode()).hexdigest()

    if not database_check_dirty(fname, meta_hash):
        return False

    # We have a dirty file ...
    if database_check_present(fname):
        # We are present - this is a MODIFIED entry
        if facebook_update(meta, token, meta['post_id']):
            database_update(fname, meta_hash)
        else:
            return False
    else:
        # We are not present, this is a NEW entry
        backdate = get_backdate(fname)

        res = facebook_post(meta, token, page, backdate = backdate)
        if res:
            meta["post_id"] = res
            database_update(fname, meta_hash)
        else:
            return False

    return True

def main():
    newtoday = 0

    database_prepare()

    # Load the access token
    with open(TOKEN_FILE, 'r') as f:
       token = f.read().strip()

    # Get the list of pages we are allowed to manage
    params = { "access_token" : token }
    response = requests.get(f"{GRAPH_API_URL}/me/accounts", params=params)
    if response.status_code != 200:
        print("Failed to retrieve list of pages, maybe missing permissions")
        print(response.json())
        return False

    # Check if we find the page we should update either by name or by ID
    pages = response.json().get("data")
    pagedata = None
    for page in pages:
        if (page["name"] == PAGE_ID) or (page["id"] == PAGE_ID):
            if "CREATE_CONTENT" not in page["tasks"]:
                print("Not allowed to create content on the desired page")
                return False
            # We found the page ...
            pagedata = page
            break
    if pagedata is None:
        print("Requested page is not found in your accessible pages")
        return False

    # Since we use a JSON database we load all published records into main
    # memory ...

    # Now iterate over all files and call "process_file" on them
    allfiles = []
    for directory in WATCHDIRS:
        realpath = os.path.join(BASEDIR_WWW, directory)
        for root, _, files in os.walk(realpath):
            for file in sorted(files):
                if file.endswith(".html"):
                    file_path = os.path.join(root, file)
                    allfiles.append(file_path)

    for file_path in sorted(allfiles):
        if process_file(file_path, token, page):
            newtoday = newtoday + 1
            if (MAX_NEW_PER_RUN > 0) and (newtoday >= MAX_NEW_PER_RUN):
                break

    database_shutdown()

if __name__ == "__main__":
    main()

Summary

This script automates the publication of blog articles to Facebook by combining metadata extraction, file tracking, and API interactions. Its modular design makes it easy to adapt for different workflows or scale for larger websites. By reducing manual effort and ensuring consistent updates, the script streamlines social media management for static site blogs.