Files
RustyUI/crates/stable-diffusion-burn/burn-crates/burn-collective/multinode-tests/README.md
Ben_Kosytorz 3a67c0979c feat: update workspace paths and enhance gitignore
- Updated stablediffusion crate path from "../stable-diffusion-burn" to "./crates/stable-diffusion-burn" for proper workspace resolution
- Enhanced .gitignore to include generated model files (.mpk, .pt, .bin, .safetensors, .ckpt) and user_data directory
- Added Cargo.lock to gitignore with appropriate comment
- Reorganized IDE files section in gitignore for better clarity
- Added newline at end of file for proper formatting
2026-03-05 19:39:14 +01:00

1.0 KiB

Integration test for burn collective operations with multiple nodes and devices.

Run cargo run --bin test_launcher

There are 3 binaries:

node.rs

Launches n threads each simulating a different device. Currently the backend is NdArray, so everything is CPU. The program takes a file with configurations and input data.

global.rs

Runs the global orchestrator, who is responsible for responding to global collective operation requests. In the case of an all-reduce, the orchestrator responds with a strategy for reducing, and the node can do the reduction independently.

test_launcher.rs

Generates input data, calculates the expected results, and launches the nodes each with their own inputs in a separate file.

The topology is [4, 4, 4, 4]. This means 4 nodes are launched, each with 4 threads (for each device).

The global orchestrator (global.rs) is also launched.

Output

The outputs and inputs for each node and the orchestrator are written to the target/test_files folder

If the nodes or orchestrator stall, there is a timeout.