25 Jan 2025 - tsp
Last update 25 Jan 2025
19 mins
Manually publishing blog articles to a Facebook Page can quickly become a tedious and time-consuming process, especially when managing multiple social media platforms simultaneously. Automation offers a solution by streamlining workflows, reducing the likelihood of errors, and ensuring a consistent online presence. Additionally, automation significantly decreases the manual effort required, allowing you to focus on more strategic tasks. This approach uses a static site generator, the Facebook API, and a web server to handle authentication and token storage, creating an efficient and seamless connection between your blog and its audience on Facebook.
Static site generators, when compared to traditional content management systems (CMS), provide substantial advantages. They enable a simplified frontend setup, drastically reducing the attack surface and improving efficiency and scalability. This translates to lower operational costs and increased reliability. Furthermore, static sites excel in leveraging caching mechanisms such as CDNs or services like Amazon S3, optimizing performance for high-traffic websites. Unlike traditional CMS solutions that rely on dynamic scripting for every request, static sites focus on delivering prebuilt files, alleviating the server’s workload.
When combined with automated build systems and version control tools like Jenkins and Git, static site generators retain the flexibility of dynamic content management. Authors can write their articles offline using straightforward format like Markdown or graphical editors and then push the content to version control. Workflows can incorporate editorial review processes akin to those used in software development. Automated systems periodically generate and deploy updated sites through tools like rsync or cloud storage solutions, ensuring a smooth and robust publishing cycle.
Automating the social media distribution of blog articles enhances this process further by saving time, maintaining consistent updates, and increasing audience engagement through timely publications. It also frees up resources, enabling teams to focus on more valuable activities. In short, automating repetitive tasks is a cornerstone of operational efficiency, and implementing this solution for blog publishing is a step toward smarter and more sustainable content management.
An automated publishing pipeline requires a few key components. First, you need an existing and functioning static website generated by a tool such as Jekyll or Hugo. This static site serves as the foundation for your blog content. To integrate with Facebook, you’ll need Python installed on your system, along with the necessary libraries like requests and BeautifulSoup, which can be installed via pip.
$ pip install BeautifulSoup requests feedparser
Additionally, a Facebook account is required, along with administrative access to the Facebook Page you intend to manage.
For hosting the authentication and token management workflow, a WSGI-capable server is essential. While Apache with mod_wsgi
is a common choice, you can use any WSGI server. Alternatively, you could opt for a self-hosted setup and redirect the OAuth
flow to this server on your local machine, depending on your environment and needs. This flexibility allows you to adapt
the system to your existing infrastructure, ensuring seamless integration.
All Facebook API call callbacks must be directed to HTTPS-secured endpoints to maintain a secure communication channel. Additionally, the token obtained through the API provides access to your Facebook page and must be treated as highly confidential. Ensure that API credentials are stored securely, preventing the webserver from exposing them to end users while allowing authorized scripts to access them. Always verify that the required permissions are correctly configured for seamless integration.
Let’s begin with the first component - if you don’t want to run with a WSGI script skip to the end of this section for a hack for manual single deployment: the Login and Token Management script. Accessing Facebook’s API requires a token, which serves as an authentication mechanism. Your application is identified by an APP_ID and an APP_SECRET, both of which are generated through the Facebook Developer Console (https://developers.facebook.com/). To initiate user login and obtain a short-lived (approximately one hour) access token, users are redirected to Facebook’s OAuth endpoint. During this process, you must request specific OAuth permissions to ensure the application can perform the required actions seamlessly. These permissions include:
These permissions ensure that the app can operate efficiently while adhering to Facebook’s access control policies. It is important to request only the permissions necessary for the app’s functionality to maintain security and comply with Facebook’s guidelines.
Since short-lived tokens are unsuitable for background services, the next step involves exchanging this token for a long-term user token that does not expire for publishing purposes. For user information access, these tokens have a three-month expiry, but this does not impact our scripts as they do not require user-specific data.
The first step is to create a new Facebook App. When administering a page, select “Business” as the app type rather than “Consumer”
and define its category as “Other.” Then, configure basic settings, including a display name, app domains (your own domains and
subdomains), a contact email, and a privacy policy URI (this is mandatory for the app to go live). Under the Advanced settings,
you can specify which web servers are permitted to access the Facebook API for your application. Be sure to copy your APP_ID
and APP_SECRET
for future use.
Next, use the “Add Product” feature in the Developer Console to select “Facebook Login for Business.” In the settings, define the
redirect URI. Then navigate to the configurations of Facebook Login and create a new setup using the Page Management template.
Once added, edit the Page Management template to include the necessary permissions: business_management
, pages_manage_posts
, pages_read_engagement
,
and pages_show_list
. This completes the app’s configuration.
While your Facebook App remains in development mode, posts will only be visible to you and other test roles. This enables posting during development without affecting the live environment. API limits still apply of course.
While developing you can test the requests we are going to do using Facebok Graph Explorer at https://developers.facebook.com/tools/explorer/ and the Access Token Debugger at https://developers.facebook.com/tools/debug/accesstoken/
To integrate the WSGI script with Apache, you need to configure the VirtualHost as shown below:
WSGIScriptAlias /fbpageapi /path/to/your/wsgi/fbapipageapi.py
WSGIScriptAlias /fbpageapi/callback /path/to/your/wsgi/fbapipageapi.py
WSGIDaemonProcess yourdomain user=your_user group=your_group processes=1 threads=1
WSGIProcessGroup yourdomain
In addition, the mod_wsgi
module must be loaded into Apache. This is done by adding the following
line to the httpd.conf
configuration file:
LoadModule wsgi_module /usr/local/libexec/apache24/mod_wsgi.so
This setup ensures that the WSGI script is properly served by Apache, providing the necessary environment for handling Facebook authentication and token management workflows.
The process begins when a user accesses our designated URI. In this straightforward implementation, users are automatically redirected to Facebook’s OAuth endpoint without the need for form inputs. This immediate redirection simplifies the user journey and ensures an efficient and user-friendly authentication process.
Upon successful login, Facebook redirects the user back to our callback URI. At this stage, the short-lived token provided by Facebook is exchanged for a long-lived token, which is then securely stored in a file for future use. It is critical to note that the callback must be served over HTTPS to meet Facebook’s security requirements and ensure a secure token exchange process.
Let’s implement the script that facilitates the Facebook login and token management workflow. This script redirects the user’s
browser to Facebook’s login form, where they are prompted to grant the following permissions: business_management
, pages_manage_posts
, pages_read_engagement
,
and pages_show_list
. Upon the user’s approval, an access token is obtained and exchanged for a long-lived access token.
This token is then stored securely in a token.dat
file. For simplicity, static configurations are used within the script. The
script can be deployed via mod_wsgi
or executed as a standalone application.
from wsgiref.simple_server import make_server
import requests
from urllib.parse import urlencode
# Application credentials and configuration
APP_ID = "XXXXXXXXXXXXX" # Your Facebook App ID
APP_SECRET = "XXXXXXXXXXXXXXXXXXXXXX" # Your Facebook App Secret
REDIRECT_URI = "https://yourdomain.com/fbpageapi/callback" # Your callback URL
TOKEN_FILE = "/path/to/your/token.dat" # File to store the access token
def application(environ, start_response):
"""WSGI application handling Facebook OAuth authentication and token management."""
path = environ['PATH_INFO']
query = environ['QUERY_STRING']
params = dict(q.split('=') for q in query.split('&') if '=' in q)
if path == "" or path == "/":
# Step 1: Redirect user to Facebook's OAuth login page
fb_url = f"https://www.facebook.com/v21.0/dialog/oauth?{urlencode({
'client_id': APP_ID,
'redirect_uri': REDIRECT_URI,
'scope': 'business_management,pages_manage_posts,pages_read_engagement,pages_show_list'
})}"
start_response('302 Found', [('Location', fb_url)])
return []
elif path == "/callback":
# Step 2: Handle Facebook callback and exchange code for token
code = params.get('code')
if code:
# Exchange authorization code for short-lived token
token_url = f"https://graph.facebook.com/v21.0/oauth/access_token?{urlencode({
'client_id': APP_ID,
'redirect_uri': REDIRECT_URI,
'client_secret': APP_SECRET,
'code': code
})}"
token_response = requests.get(token_url).json()
access_token = token_response.get('access_token')
if access_token:
# Step 3: Exchange short-lived token for long-lived token
long_token_url = f"https://graph.facebook.com/v21.0/oauth/access_token?{urlencode({
'grant_type': 'fb_exchange_token',
'client_id': APP_ID,
'client_secret': APP_SECRET,
'fb_exchange_token': access_token
})}"
long_token_response = requests.get(long_token_url).json()
long_lived_token = long_token_response.get('access_token')
if long_lived_token:
# Step 4: Store the long-lived token securely
with open(TOKEN_FILE, 'w') as token_file:
token_file.write(long_lived_token)
start_response('200 OK', [('Content-Type', 'text/plain')])
return [b"Long-lived token successfully fetched and stored."]
# Error handling for token exchange failures
start_response('400 Bad Request', [('Content-Type', 'text/plain')])
return [b"Failed to fetch token."]
# Handle undefined paths
start_response('404 Not Found', [('Content-Type', 'text/plain')])
return [(f"Not found: {path}").encode('utf-8')]
if __name__ == "__main__":
# Run the WSGI application as a standalone server for testing
httpd = make_server('', 8051, application)
print("Serving on port 8051...")
httpd.serve_forever()
Keep in mind that this script permits any user to log in and potentially overwrite the stored token. To prevent unauthorized access, it is essential to secure this script with measures such as HTTP authentication. Properly restricting access ensures the integrity and confidentiality of the token management process.
This implementation ensures secure and efficient management of Facebook OAuth tokens for page management tasks.
In case you dont want to use an exposed WSGI script you can also utilize Graph Explorer to fetch a short lived user token. Then
you can utilize the Access Token Debugger to convert the token to a long lived one. This token can then be stored in the token.dat
file manually. This is of course an hackisch approach but enough for a hobby platform - and it reduces the attack surface even more.
You still have to create the Facebook application as described in the previous section though.
The purpose of the publishing script is to automate the posting of blog articles from a static website to a Facebook page. The script operates by integrating metadata extraction, file tracking, and interaction with the Facebook Graph API. Here’s how it works:
The publishing script begins by loading our long-lived user token stored from a file on our machine. This token enables access to the Facebook page and allows for content creation. It then loads an internal JSON-based database that keeps track of published posts and their metadata. This database is used to identify whether a post is new or has been updated since the last run. A JSON based “database” is useful for small scale deployments, one can simply replace it later on with a real database when the page grows.
The script fetches a list of Facebook pages that the user has access to and verifies that the page configured in the script is accessible. This includes ensuring that the user has the necessary privileges, such as CREATE_CONTENT, to publish content to the page.
Once these preliminary checks are complete, the script iterates over a predefined set of directories containing rendered HTML files. For each file, it extracts metadata using Open Graph (OG) meta tags, calculates a unique hash of the metadata, and compares it with the stored hash in the database. If the hash is new or differs from the stored value, the file is flagged for posting or updating. The script also limits the number of posts it publishes per run. By setting a maximum cap on new posts, the script ensures that a larger backlog of content can be processed gradually without overwhelming the audience or appearing spammy. This measured approach helps to maintain consistent exposure and engagement with the articles.
The script uses a JSON file (published.json
) as a lightweight database. It loads this database at the start of the script
and updates it whenever a new post is published or an existing post is modified. Functions include:
database_prepare
: Loads the JSON database into memory.database_check_present
: Checks if a file has already been processed.database_check_dirty
: Determines if a file’s metadata has changed.database_update
: Updates the database with the latest metadata hash.Using BeautifulSoup, the script parses each HTML file to extract metadata such as the title, description, image, and URL from OG meta tags. If any metadata is missing, the script attempts to infer or construct the necessary information to ensure the post is complete.
The script interacts with the Facebook Graph API to publish or update posts. Key functions include:
facebook_post
: Posts new content to the Facebook page. If applicable and enabled, it backdates the post using a
timestamp inferred from the file’s modification time or directory structure.facebook_update
: Updates an existing post with modified metadata.The script processes files in the following steps:
#!/usr/local/bin/python3.9
import requests
import feedparser
import os
import hashlib
import json
from datetime import datetime, timedelta
from bs4 import BeautifulSoup
PAGE_ID = "XXXXXXXXXXXXXX" # Set your page ID here (either title or real ID are sufficient)
TOKEN_FILE = "/path/to/token.dat"
DBFILE = "/path/to/published.json"
BASEDIR_WWW = "/usr/www/www.example.com/www/" # Watchdirs are relative to the base dir
MAX_NEW_PER_RUN = 1 # This limits the maximum number of published posts per run.
GRAPH_API_URL = "https://graph.facebook.com/v21.0"
BACKDATE_THRESHOLD=datetime.now() - timedelta(days=2)
EARLIEST_DATE=datetime.strptime("2025-01-12 22:45:42", '%Y-%m-%d %H:%M:%S') # We cannot backdate before our page creation unfortunately
BACKDATE_ENABLE=True
WATCHDIRS = [ # Only files inside those subdirectories are watched and published
"math",
"2022",
"2023",
"2024",
]
# A hack'ish JSON file "database"
database = {}
def database_prepare():
global database
try:
with open(DBFILE, 'r') as infile:
database = json.load(infile)
except Exception as e:
print(f"No database available: {e}")
def database_shutdown():
global database
pass # We already write on update ...
def database_check_present(filename):
global database
if filename in database:
return True
else:
return False
def database_check_dirty(filename, hashval):
global database
if filename not in database:
return True
if database[filename] == hashval:
return False
else:
return True
def database_update(filename, hashval):
global database
database[filename] = hashval
# Note to stay consistent we write out to file
# after each update ...
with open(DBFILE, 'w') as outfile:
json.dump(database, outfile)
def get_backdate(fname):
# Get folder name date if it has the given structure
folder_date = None
parts = fname.split(os.sep) # Split path into parts
for i in range(len(parts) - 2): # Check for YYYY/MM/DD pattern
try:
folder_date = datetime.strptime("/".join(parts[i:i+3]), "%Y/%m/%d")
except ValueError:
continue
mod_time = datetime.fromtimestamp(os.path.getmtime(fname))
if folder_date and (folder_date < BACKDATE_THRESHOLD) and (folder_date > EARLIEST_DATE) and BACKDATE_ENABLE:
return folder_date
elif (mod_time < BACKDATE_THRESHOLD) and (mod_time > EARLIEST_DATE) and BACKDATE_ENABLE:
return mod_time
return None
def facebook_post(meta, token, page, backdate = None):
data = {
"message": f"{meta['og:title']}\n\n{meta['og:description']}", # \nRead more: {meta['og:url']}",
"link": meta["og:url"],
"access_token" : page['access_token']
}
if backdate is not None:
data["backdated_time"] = backdate.isoformat()
data["backdated_time_granularity"] = "day"
response = requests.post(f"{GRAPH_API_URL}/{PAGE_ID}/feed", data=data)
if response.status_code == 200:
print(f"Posted: {meta['og:title']}")
return response.json().get("id")
else:
print(f"Failed to post {meta['og:title']}")
print(f"{response.json()}")
return False
def facebook_update(meta, token, postid, page):
data = {
"message": f"{meta['og:title']}\n\n{meta['og:description']}", #\nRead more: {meta['og:url']}",
"link": meta["og:url"],
"access_token" : page['access_token']
}
if "og:image" in meta:
data["picture"] = meta["og:image"]
elif "og:image:secure_url" in meta:
data["picture"] = meta["og:image:secure_url"]
response = requests.post(f"{GRAPH_API_URL}/{postid}")
if response.status_code == 200:
print(f"Updated: {meta['og:title']}")
return True
else:
print("Filed to update {meta['og:title']}")
return False
def process_file(fname, token, page):
# Fetch metadata ...
with open(fname, 'r', encoding='utf-8') as f:
soup = BeautifulSoup(f, 'html.parser')
meta = {}
for tag in [ "og:title", "og:description", "og:image", "og:url"]:
meta[tag] = soup.find("meta", property=tag)
if meta[tag]:
meta[tag] = meta[tag].get("content", "").strip()
if not all(meta.get(tag) for tag in ["og:title", "og:description", "og:url"]):
return False
print(f"{meta['og:title']}")
# Calculate hash
meta_hash = hashlib.sha256(json.dumps(meta, sort_keys=True).encode()).hexdigest()
if not database_check_dirty(fname, meta_hash):
return False
# We have a dirty file ...
if database_check_present(fname):
# We are present - this is a MODIFIED entry
if facebook_update(meta, token, meta['post_id']):
database_update(fname, meta_hash)
else:
return False
else:
# We are not present, this is a NEW entry
backdate = get_backdate(fname)
res = facebook_post(meta, token, page, backdate = backdate)
if res:
meta["post_id"] = res
database_update(fname, meta_hash)
else:
return False
return True
def main():
newtoday = 0
database_prepare()
# Load the access token
with open(TOKEN_FILE, 'r') as f:
token = f.read().strip()
# Get the list of pages we are allowed to manage
params = { "access_token" : token }
response = requests.get(f"{GRAPH_API_URL}/me/accounts", params=params)
if response.status_code != 200:
print("Failed to retrieve list of pages, maybe missing permissions")
print(response.json())
return False
# Check if we find the page we should update either by name or by ID
pages = response.json().get("data")
pagedata = None
for page in pages:
if (page["name"] == PAGE_ID) or (page["id"] == PAGE_ID):
if "CREATE_CONTENT" not in page["tasks"]:
print("Not allowed to create content on the desired page")
return False
# We found the page ...
pagedata = page
break
if pagedata is None:
print("Requested page is not found in your accessible pages")
return False
# Since we use a JSON database we load all published records into main
# memory ...
# Now iterate over all files and call "process_file" on them
allfiles = []
for directory in WATCHDIRS:
realpath = os.path.join(BASEDIR_WWW, directory)
for root, _, files in os.walk(realpath):
for file in sorted(files):
if file.endswith(".html"):
file_path = os.path.join(root, file)
allfiles.append(file_path)
for file_path in sorted(allfiles):
if process_file(file_path, token, page):
newtoday = newtoday + 1
if (MAX_NEW_PER_RUN > 0) and (newtoday >= MAX_NEW_PER_RUN):
break
database_shutdown()
if __name__ == "__main__":
main()
This script automates the publication of blog articles to Facebook by combining metadata extraction, file tracking, and API interactions. Its modular design makes it easy to adapt for different workflows or scale for larger websites. By reducing manual effort and ensuring consistent updates, the script streamlines social media management for static site blogs.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/