Introduction to NovAST
NovAST is a deep-learning framework for automated label transfer and novel cell type discovery in spatial transcriptomics.
Installation
Here is the sample command to install packages into a python environment:
# clone the code from GitHub
git clone https://github.com/YMa-lab/NovAST.git
# create a new environemnbt with python==3.11
python -m venv NovAST_env
# activate the python environemtn
source NovAST_env/bin/activate
# install the package
pip install .
Quickstart
NovAST requires both reference and target datasets in AnnData (.h5ad) format. We need to first set up some basic path :
savedir: "./"
train_path="path_to_train_dataset.h5ad"
test_path="path_to_train_dataset.h5ad"
training_mode = "exploration"
# Column name in the reference AnnData that stores cell-type annotations
celltype_name_train = "cell_type"
name = "demo_exploration"
dataset = "dataset_name"
All remaining hyperparameters are defined in the file default_config.yaml.
Users may override any of them directly when calling run_NovAST() if customization is needed.
You can now run NovAST using the specified settings as follows:
args = run_NovAST(
training_mode=training_mode,
train_path=train_path,
test_path=test_path,
celltype_name_train=celltype_name_train,
name=name,
dataset=dataset,
rounds=10
)
For each training round, the pipeline saves all outputs to the specified directory, with each random seed assigned its own subfolder. This includes the trained model, the Stage-1 loss values, and a final result file named adata_unlabeled_final.h5ad, which stores the latent embeddings in .obsm['X_latent'] and the final predicted labels in .obs['voted_final_prediction'].
Running the following line of code will generate UMAP visualizations as well as spatial plots of the predicted cell types and their associated confidence scores, and save them to each individual seed’s output directory.
NovAST_plot(args)
Tutorials
We further provides the tutorials on the datasets presented in the main figures in the manuscript.