Skip to content

fhildeb/blog-web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

blog-web-scraper

Blog Web Scraper built by Felix Hildebrandt as final thesis for Web Analytics in 2019. The fetched data was further analysed by Lukas Brueggemann as an extended group project for a Big Data science course.

NOTE: Code Commentary appears in German.

Default Blog

By default, the web scraper is adapted to the blog of Kuechenchaotin, a known German Food and Travel webpage. On demand, it could be customized for any other domain as this project is just a showcase.

GUI Showcase on Windows

Screenshot Startup Screenshot Scraping

Analytics

The extended analysis based on the sample blog can be found within the /metrics folder of this repository. The structure is similar to this table of contents in the main description file and includes subfolders for internal and external analytics.

  1. Link to General Metrics Documentation
  2. Link to Internal Metrics Analysis
  3. Link to External Metrics Analysis

General Metrics

As stated, the tool can be used to measure value, sucess and outcome of different web blogs. Based on the script, following core value gains can be fetched:

  • Conversation Rate in Comments per Post
  • Outcome in Post per Month
  • Content Created in Words per Post
  • Blog Value
  • Applied Business Models
  • Social Media Communities
  • Communication Pillars
  • Alignment and Media Design

Internal Metrics

Based on the sample data with over 600 posts, there can be done various predictions, evaluations, and assessmentsand:

  • Comment Count Predictions
  • Comment Frequency
  • Interaction Trendline
  • Publication Date Measures
  • External Link Extraction
  • Post Category Analysis

External Metrics

Further, external sources like SimilarWeb was used to combine internal and external metrics with traffic and search data from social media listings or referrals:

  • Referring Traffic
  • Search Visits
  • Search Engagement
  • Channel Analytics
  • Demographics
  • Geographics
  • Browsing Categories
  • Total Visits

Contributors

  • Lukas Brueggemann

Tools

About

Blog Web Scraper

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages