Html scrapper in Python for SpellTastic.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Nicolas FRANCO b94d5ddf1d
readme adjustements 📝
1 year ago
database update outputs 🚀 with clean level attribute 1 year ago
outputs update outputs 🚀 with clean level attribute 1 year ago
scrapping minor fixes + documentation 1 year ago
.gitignore Initial commit 1 year ago
README.md readme adjustements 📝 1 year ago

README.md

Spell Scrapper 📜 🐍

About this repository

This repository was built in order to have an "up to date" spells database for SpellTastic, a cross-platform spell manager for Pathfinder.

Data source

All data is retrieved from d20pfsrd the #1 Pathfinder Roleplaying Game rules reference site. All spells can be found at spells.

The latest data extracted is available as a YAML file and can be found in the outputs directory.

Getting Started

Prerequisites

  • Python 3.6+

Python libraries

  • BeatifulSoup4
  • Requests
  • lxml
  • PyYAML
  • sqlite3

Installing

  1. Cloning repository
git clone https://github.com/your_username/pathfinder-spell-scraper.git
  1. Install the required libraries
pip install requests beautifulsoup4 lxml pyyaml

Usage

Scrapping

  1. You can run scrap-spells.py to scrape the spell information from the website:
python scrapping/scrap-spells.py
  1. This command will generate a file spells.yaml with all spells and their attributes. The file should be found in the outputs directory.

A progress bar should be displayed in your terminal while scrapping, showing the time left and the number of spells scraped. The script should takes about 20 minutes to scrap all spells.

Database

  1. You can build a .db sqlite3 databse file by running the spell_db.py file:
python database/spell-db.py
  1. The script will generate a spells.db file with a spell table containing all the spell information. This file should also be found in the outputs directory.