Creating a Project
Begin by double-clicking on the new_project.ipynb
notebook. Follow the instructions to create a new project, and then click the link to navigate to the project folder.
Note
It is good practice to close and halt the new_project.ipynb
notebook after you have opened the project folder in a new browser tab.
A project is a folder containing copies of the WE1S project template files and folders. Here is a brief description of each part of the project:
The README.md
File
This is a Markdown file containing details about the version of the WE1S template used to create the project and any metadata about the project which you added in new_project.ipynb
. It is meant to be a human-readable guide to the content of the project.
The datapackage.json
File
The datapackage.json
file is a manifest of your project's resources which is compliant with the WE1S manifest schema and the Frictionless Data project specification. The purpose of this file is to enable easier interoperability between your data and tools outside the WE1S Workspace. The datapackage.json
file is is a JSON file containing metadata about the project and a complete list of the file paths to all resources in the project. If you export your project, the export
module will detect any files you have added and add their paths to datapackage.json
.
The config.py
File
The config.py
file inside the config
folder is a Python file that contains information about the Workspace's server environment and the resources installed with the project template. It is used to restore the project to a virgin state if you run the models/clear_caches.ipynb
notebook.
The modules
Folder
The modules
folder contains all the project's Jupyter notebooks (and supporting scripts and files). Each module focuses on a particular task: e.g., creating a topic model or visualizing it. Some modules can be used at any point in your workflow. Others need to be implemented in a certain order. For instance, the dfr_browser
, topic_bubbles
, and pyldavis
modules create visualizations of topic models, so they will naturally not work until you have run the topic_modeling
module.
The project_data
Folder
The project_data
contains all your project's primary data. It is empty when the project is first created until you import your data using the import
module.
Note
Sample projects providing example of the contents of each of these components of the Workspace can be found in the examples folder on GitHub.
Importing Data to Your Project
Before you do anything else, you must import some data to your project. The WE1S accepts data in four formats:
- A zip archive of plain text (
.txt
) files accompanied by a CSV file containing metadata. - A zip archive of JSON files combining the textual content and metadata.
- A Frictionless Data data package containing paths to all your data files.
- A query of records in a MongoDB database.
To import your data, navigate to modules/import/import.ipynb
. This notebook creates a new folder, project_data/json
and copies your data from its source into this folder, converting it into JSON format, if necessary.
Important
Most tools in the WE1S Workspace use the JSON files in the project_data/json
folder. These tools assume that the files are compliant with the WE1S manifest schema. The import
module provides some functions for converting your metadata fields to the expected format; however, it cannot cover every scenario. You may need to perform some preprocessing on your data prior to import.
Note
The import
module automatically coerces your textual data to UTF-8 character encoding.
What Next?
Once you have imported your data, you can perform a number of procedures with the other modules. The WE1S project primarily employs topic modeling in its research methodlogy, so this technique is prominent in the Workspace in its current version. Many of the modules depend on your first having run a topic model on your data. This makes the topic_modeling
module a good place to start. The metadata
module contains some analysis and visualization tools that do not require a pre-existing topic model.