Making a custom personal blog and resume page using Flask

Apr 18, 2024

When I started this journey I dove head first into Python which was the second programming language I ever learned and out of all the languages I’ve learned since, I still remembered it to be my favourite. There’s just something about how it always seems to have easily readable code due to its use of whitespace that I envied whenever I used other languages. So I did an online self-paced bootcamp and was reminded of why I enjoyed this language so much. Then I learned about Data Science and Machine Learning and simply got hooked. Eventually though, I acknowledged that even if I got really good at DS/ML projects, I’m probably going to need a way to showcase them in an app or something, so why not learn to make a web app? I have also been advised to blog my learning journey and have a webpage for my resume since I’m going the self-taught route. So I figured, why not check all those boxes at once? That’s where the excuse to make this website using Flask came to mind. Unfortunately, it would be far too long and too in-depth for a blog post to go through the entire thing from start to finish. So instead I’m going to briefly go through the pieces of it that I believe are of my own unique design, or at least are pieces that you couldn’t just ask an AI chat bot to produce for you.

Curriculum Vitae (CV) Page

Having a webpage to show off your portfolio and resume is great to have. But how about one that can continuously generate itself from your CV data? In my search to see if a “data only resume” format existed, I came across JSON Resume which appeared somewhat popular and what I was looking for. With this you simply follow the schema with your own data and then you can host it as a public Gist (example) to use their service and generate different resume themes with the ?theme= query string (example #1, example #2). Heck you can even use it to generate AI cover letters which is pretty neat.

So I went and created my own Gist using this schema with the intention of creating a webpage that will continuously update and generate itself as I add experiences and projects. The Flask portion of this webpage is fairly straightforward. I start by creating a simple function for grabbing the Gist’s data in my models.py:

models.py
def get_json_data(gist_url):
    #Attempt to grab the data from online Raw Gist
    try:
        response = requests.get(gist_url)
        data = json.loads(response.text)
        return (data, True)
    #Resort to using Static file if error using Gist Url
    except:
        with open(app.config['STATIC_CV_JSON_LOC'], 'r') as file:
            data = json.load(file)
        file.close()
        return (data, False)

This simple function uses the Requests library to fetch the data from a Gist url. In case there’s an issue with the URL I also have a back-up static file it can pull from so I return both the JSON data as a Python dictionary and a boolean so the page will know to load any url images as their static counterpart.

Then I use the above function in an @app.route for the cv page:

routes.py
@app.route('/cv')
def cv():
    cv_data, use_gist_img = get_json_data(app.config['CV_JSON_GIST_URL'])
    #Grab desired parts so we don't need to pass the full dict
    basics = cv_data['basics']
    skills = cv_data['skills']
    experience = cv_data['work']
    educ = cv_data['education']
    certs = cv_data['certificates']
    projects = cv_data['projects']
    pubs = cv_data['publications']
    interests = cv_data['interests']
    #Covert JSON Schema date formats within Experience, Education, Publications,
    # and Certificates from 2020-04 to Apr 2020
    conv_dict = {'01': 'Jan', '02': 'Feb', '03': 'Mar', '04': 'Apr', '05': 'May', '06': 'Jun',
                 '07': 'Jul', '08': 'Aug', '09': 'Sept', '10':'Oct', '11':'Nov', '12':'Dec'}
    for item in experience:
        item['startDate'] = conv_dict[item['startDate'][5:7]] + ' ' + item['startDate'][0:4]
        item['endDate'] = conv_dict[item['endDate'][5:7]] + ' ' + item['endDate'][0:4]
    for item in educ:
        item['startDate'] = conv_dict[item['startDate'][5:7]] + ' ' + item['startDate'][0:4]
        item['endDate'] = conv_dict[item['endDate'][5:7]] + ' ' + item['endDate'][0:4]
    for item in certs:
        item['date'] = conv_dict[item['date'][5:7]] + ' ' + item['date'][0:4]
    for item in pubs:
        item['releaseDate'] = conv_dict[item['releaseDate'][5:7]] + ' ' + item['releaseDate'][0:4]
    return render_template('CV.html', title='CV', basics=basics, skills=skills, experience=experience, educ=educ,
                           certs=certs, projects=projects, pubs=pubs, use_gist_img=use_gist_img, interests=interests)

Since the dict has a lot of data which I won’t be using for this page I extract the parts I need which may be a little less data, but I find easier to work with. Additionally, because the Schema requires YYYY-MM date formats, I convert these to use a month name short hand which I think looks better. Finally, I render the CV.html template and pass all this data.

Designing and creating the CV.html page and its CSS styles was actually the bulk of the work. Thanks to Bootstrap however, stlying, aligning, and adding features was fairly straight forward. Below is a sample which generates the first part of the Experience section. Flask uses the Jinja template library for working with HTML files so here I have a For loop to iterate through the Experiences dictionary which was passed into this template file. Since I wish to only display my latest experiences and have the rest in an expansion, I iterate through the first 3 using Python’s list slicing. Using Bootstrap’s row and 12 grid system, each experience is a row which I split into a 1-size column for the timeline, an auto-sized column for the icon, and then for the actual experience content to look spiffy on various screen sizes I take advantage of Bootstrap’s breakpoints for setting each to 7/8/remainder as the window size changes which creates a nice responsive row. Additionally, since showing the highlights of each experience can be quite a bit of text, I decided to place this information into an expanding feature using Bootstrap’s collapse feature which was very easy to modify for my needs. This responsive design approach with collapsable features is what I use throughout the page using a styles.css file to pretty it up.

CV.html
    {% for xp in experience[:3] %}
    <div class="row justify-content-md-center">
        <div class="col-1 timespan">{{xp['endDate']}}<br>to<br>{{xp['startDate']}}</div>
        <div class="col-auto ico">
            <div class="entry-dot"></div>
            <img src="{{url_for('static', filename='images/')}}{{xp['name'].split(' ')[0]}}.jpg" class="ico"/>
        </div>
        <div class="col-xxl-7 col-lg-8 col desc">
            <h6><b>{{xp['position']}}</b> @ <i>{{xp['name']}}</i></h6>
            <p class="summ">{{xp['summary']}}</p>
            <button class="btn btn-primary hl_btn hl_btn-primary collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#hl_acc_{{loop.index}}" aria-expanded="false" aria-controls="hl_acc_{{loop.index}}">
                Highlights: <img src="{{url_for('static', filename='images/arrow-up.svg')}}" style="width:17px;"/>
            </button>
            <div class="collapse" id="hl_acc_{{loop.index}}">
                <div class="card card-body hl_card-body">
                    <ul>
                        {% for highlight in xp['highlights'] %}
                        <li class="summ">{{highlight}}</li>
                        {% endfor %}
                    </ul>
                </div>
            </div>    
        </div>
    </div>
    {% endfor %}

Jupyter Notebooks Blog using Quarto

When I first started this project, I knew I wanted use Jupyter Notebooks to write posts and show code/output as I learn more and build projects. In my search I came across Quarto which is an open-source scientific and technical publishing system that allows you to author content in Jupyter notebooks and publish it in various formats including the HTML I needed. Actually, Quarto even has the capability to create full Quarto-powered websites if you wanted. I, however, only needed a robust and feature rich way to convert .ipynb to .html which Quarto certainly has.

Quarto has a lot of features and options which will add CSS and Javascript files to the output depending on which options are chosen. For the needs of this blog, the following options are sufficient. Output options in Quarto are placed in the top cell of the notebook with three dashes at top and bottom:

---
title: “Sample Post”
abstract: “This post is an example for testing”
format:
   html:
      code-copy: true
      highlight-style: pygments
      extract-media: extra_media_filename
jupyter: python3
---

With these header-options and running the command “quarto render <file_path>.ipynb –to html” in a jupyter terminal will create the following output structure…

…which is a number of files due to the features available. Since we don’t want redundant files throughout the blog, the CSS and JS files can be copied to a central location for all posts. Then we’re left with the main .html file and its media. Of course by changing where these files are relative to the blog post .html file, we’ll need to modify the reference locations in the file itself. Additionally, to make adding posts easy, we can add some back-end functions which will allow us to ZIP the html/media files we want for upload and have the Flask application organize/modify them to our needs while storing meta data in an SQLite3 database. To accomplish this, we start by adding a db.Model class to models.py and a FlaskForm to forms.py:

models.py
class BlogPost(db.Model):
    #Use SQLite3 DB to store blog post meta-data.
    id: so.Mapped[int] = so.mapped_column(primary_key=True)
    post_name: so.Mapped[str] = so.mapped_column(sa.String(128), index=True, unique=True)
    post_title: so.Mapped[str] = so.mapped_column(sa.String(128))
    date: so.Mapped[datetime] = so.mapped_column(default=lambda: date.today())
    description: so.Mapped[str] = so.mapped_column(sa.String(256))
forms.py
class AddBlogPostForm(FlaskForm):
    #Form to add a new blog post via a zip file with defined structure.
    #The actual post Title and Description will be extracted from the uploaded file.
    post_name = StringField('Post Name (for URL)', validators=[DataRequired()])
    post_zip_file = FileField('Zip File', validators=[FileRequired()])
    submit = SubmitField('Add Post')

    #Validate that I don't accidentally make two posts with same name in DB
    def validate_post_name(self, post_name):
        post = db.session.scalar(sa.select(BlogPost).where(BlogPost.post_name == post_name.data))
        if post is not None:
            raise ValidationError('A blog post with that Post Name already exists. Please choose a different name.')

This simple form has a string field for the post_name and a file field for the ZIP file. The post_name will be a unique name which will reference the post and its files, so it needs validation. Then, to access and load this form, we can create a new route to it in the routes.py file where we can extract the submitted information:

routes.py
@app.route('/add_post', methods=['GET', 'POST'])
@login_required
def add_post():
    form = AddBlogPostForm()
    if form.validate_on_submit():
        #Read the uploaded .zip as bytes:
        file_data = form.post_zip_file.data.read()
        #Run the add_blog_post function which extracts and modifies the Quarto output to fit this web app:
        post_title, description = create_blog_post(file_data, form.post_name.data)
        #Add post metadata to database:
        new_post = BlogPost(post_name=form.post_name.data, post_title=post_title, description=description)
        db.session.add(new_post)
        db.session.commit()
        #Redirect to new post:
        return redirect(url_for('blog_post', post_name=form.post_name.data))

    return render_template('add_blog_post.html', form=form)

Here we unzip/modify/organize the uploaded files and extract the post’s Title and Description from the .html file using the create_blog_post() function which I’ve added below. By extracting the Title/Description we can have these included in the database meta data so we can query and display it in the list of blog posts. The create_blog_post() function exists in the models.py file and is as follows:

models.py
def create_blog_post(zip_file, post_name):
    #First extract the uploaded .zip, modify the html, and then extract/store in static/templates
    #Returns (post_title, description)
    post_title = ''
    description = ''
    zip_in_memory = io.BytesIO(zip_file)
    with zipfile.ZipFile(zip_in_memory, 'r') as zf:
        for item in zf.infolist():
            if item.is_dir(): continue
            f_name, file_ext = os.path.splitext(item.filename)
            if file_ext.lower() == '.html':
                _handle_data = bs4.builder._htmlparser.BeautifulSoupHTMLParser.handle_data
                bs4.builder._htmlparser.BeautifulSoupHTMLParser.handle_charref   = lambda cls,s: _handle_data(cls, '&#'+s+';')
                bs4.builder._htmlparser.BeautifulSoupHTMLParser.handle_entityref = lambda cls,s: _handle_data(cls, '&'+s+';')
                bs4.dammit.EntitySubstitution._substitute_html_entity = lambda o: o.group(0)
                bs4.dammit.EntitySubstitution._substitute_xml_entity  = lambda o: o.group(0)
                html_raw = zf.read(item).decode('UTF-8')
                html_raw = html_raw.replace('{{', '&#123;&#123;')
                html_raw = html_raw.replace('{%', '&#123;%')
                html_raw = html_raw.replace('', '#&#125;')
                soup = BeautifulSoup(zf.read(item).decode('UTF-8'), 'html.parser')
                #Remove top !DOCTYPE, which is usually always line #1, but catch if it's not and iterate contents to find it:
                if isinstance(soup.contents[0], Doctype):
                    soup.contents[0].extract()
                else:
                    for item in soup.contents:
                        if isinstance(item, Doctype):
                            item.extract()
                #Remove all <meta>:
                for item in soup.find_all('meta'):
                    item.decompose()
                #Extract title and description text:
                post_title = soup.title.get_text()
                description = soup.find(class_='abstract').contents[2].get_text().strip() #abstract div has a title div, then the actual abstract.
                #Remove <html> and <head> wrappers:
                soup.html.unwrap()
                soup.head.unwrap()
                #Remove the <title> and the <header>:
                soup.title.decompose()
                soup.header.decompose()
                #Modify the <script> and <link> 'src' items from filename_files/libs/ into jinja static location
                for src in soup.find_all('script',{"src":True}):
                    src['src'] = src['src'].replace(f_name + "_files/libs/", "{{url_for('static', filename='quarto_inc/')}}")
                for href in soup.find_all('link',{"href":True}):
                    href['href'] = href['href'].replace(f_name + "_files/libs/", "{{url_for('static', filename='quarto_inc/')}}")
                #Modify the <img> 'src' items from filename_files/ into jinja static post_name locations
                    #Note: if f_name has a space in it, Quarto changes it to %20
                for img in soup.find_all('img',{"src":True}):
                    img_src_str = img['src'].split('/')[-1]
                    img['src'] = "{{url_for('static', filename='blog_post_media/" + post_name + "/" + img_src_str + "')}}"
                    #If the image uses glightbox then also edit its a-tag href
                    if img.parent.get('class') and img.parent.name == 'a' and 'lightbox' in list(img.parent.get('class')):
                        img.parent['href'] = "{{url_for('static', filename='blog_post_media/" + post_name + "/" + img_src_str + "')}}"
                #Create app/templates/blog_posts folder if it doesn't exist
                if not os.path.exists(app.config['BLOG_HTML_LOC']):
                    os.makedirs(app.config['BLOG_HTML_LOC'])
                #Save the output to post_name.html:
                with open(os.path.join(app.config['BLOG_HTML_LOC'], post_name + '.html'), "w", encoding='utf-8') as f:
                    f.write(str(soup))
            else:
                #Quarto's 'extract-media' option places all media into a named folder, but still has some sub-folder structure to it.
                #Flatten that structure and 
                item.filename = item.filename.rsplit('/', 1)[1]
                zf.extract(item, path=os.path.join(app.config['BLOG_FILES_LOC'], post_name))
    return (post_title, description)

In this function we use Python’s io library’s BytesIO() to load the zip file into memory (rather than using a temporary storage location since there should always exist enough memory for these small ZIP files) and go through the contents. If the file is a .html file then we use the BeautifulSoup4 library to find and extract/replace the Title and Description tags and then modify all the <img> and <a> tags’ src & href locations into Flask url_for() functions for Jinja to find. Speaking of Jinja, before creating the soup of HTML there was actually a minor bug that required a modification of BS4’s HTML parser. If a blog post contains any Jinja syntax in paragraphs or code blocks, such as this post, then Jinja will incorrectly attempt to parse/use them and an error will occur. To fix this we first replace Jinja’s delimiters with their equivalent HTML entity. However, BS4’s HTML parser when decoding/encoding will automatically convert these to/back from HTML entities so we also essentially force BS4 to ignore these when handling the data. Finally, we write this new ‘soup’ of html to a file with the filename ‘post_name.html’. If the file isn’t a .html file, then it must be a media file (reminder: we’ve moved those CSS/JS files to a central location so they’re not in the ZIP) so we simply extract it into Flask’s static folder into a subfolder using the post_name so we can reference it.

And that’s it! With this we have a fairly easy workflow to write posts in Jupyter Notebooks and then upload them to a blog site. We can query posts from the database and order them by upload date, we can add delete/modify functions, and whatever additional features we want. Flask is pretty awesome since you can just add what you need as you build a webapp compared to other back-end frameworks which supply everything upfront making it hard to stay lightweight like this. Plus, it’s pretty cool to have your own custom built blogging app!

Extra: Creating an Admin login for managing blog posts

Since I don’t want just anyone who stumbles upon the /add_post and /delete_post pages to mess with my blog, I went and placed these behind an Admin login. Surprisingly, it was a bit difficult to find “how-to” content online for a singular using the flask_login tool. Usually, a database of Users/Passwords is used for this and I just didn’t need that. However, to have just one user without a database while still utilizing Flask’s login tools, you’ll still need a User class:

models.py
from flask_login import UserMixin

class User(UserMixin):
    id = 1

Flask provides a UserMixin object helper which has all the default implementations for the methods that Flask-Login expects user objects to have. So we can just use this and id = 1 for our single user. Then, when it comes time to log this single user in, we can just use the logn_user() function and pass an empty User object into it:

routes.py
@app.route('/login', methods=['GET', 'POST'])
def login():
    form = LoginForm()
    if form.validate_on_submit():
        if form.username.data == app.config['ADMIN_USERNAME'] and form.password.data == app.config['ADMIN_PASSWORD']:
            login_user(user=User())
            return redirect(url_for('index'))
        else:
            flash('Invalid username or password')
            return redirect(url_for('login'))
    return render_template('login.html', title='Sign In', form=form)

Pretty straight forward, however since there was no searchable info about this, I figured I’d add it here. The login info can simply be hardcoded into the config.py file, too.

An offline public copy (without config login/secret-key info of course!) of this project is available on my Github which I’ll try to keep updated to the latest version.