Html scrapper in Python for SpellTastic.

Go to file

Nicolas FRANCO b94d5ddf1d readme adjustements 📝		2 years ago
database	update outputs 🚀 with clean level attribute	2 years ago
outputs	update outputs 🚀 with clean level attribute	2 years ago
scrapping	minor fixes + documentation	2 years ago
.gitignore	Initial commit	2 years ago
README.md	readme adjustements 📝	2 years ago

README.md

Spell Scrapper 📜 🐍

About this repository

This repository was built in order to have an "up to date" spells database for SpellTastic, a cross-platform spell manager for Pathfinder.

Data source

All data is retrieved from d20pfsrd the #1 Pathfinder Roleplaying Game rules reference site. All spells can be found at spells.

The latest data extracted is available as a YAML file and can be found in the outputs directory.

Getting Started

Prerequisites

Python 3.6+

Python libraries

BeatifulSoup4
Requests
lxml
PyYAML
sqlite3

Installing

Cloning repository

git clone https://github.com/your_username/pathfinder-spell-scraper.git

Install the required libraries

pip install requests beautifulsoup4 lxml pyyaml

Usage

Scrapping

You can run scrap-spells.py to scrape the spell information from the website:

python scrapping/scrap-spells.py

This command will generate a file spells.yaml with all spells and their attributes. The file should be found in the outputs directory.

A progress bar should be displayed in your terminal while scrapping, showing the time left and the number of spells scraped. The script should takes about 20 minutes to scrap all spells.

Database

You can build a .db sqlite3 databse file by running the spell_db.py file:

python database/spell-db.py

The script will generate a spells.db file with a spell table containing all the spell information. This file should also be found in the outputs directory.