Config
There are four configuration objects for each Band’s component.
Each configuration field is optional or required. If a field is optional, then it is guaranteed that the default value exists. If a field is required, a configuration cannot be generated by RuntimeConfigBuilder without specifying the field.
Enumeration Types
BandSchedulerTypekFixedDeviceFixedWorkerkRoundRobinkShortestExpectedLatencykFixedDeviceGlobalQueuekHeterogeneousEarliestFinishTimekLeastSlackTimeFirstkHeterogeneousEarliestFinishTimeReserved
CPUMaskFlagskAllkLittlekBigkPrimary
DeviceFlagskCPUkGPUkDSPkNPU
SubgraphPreparationTypekNoFallbackSubgraphkFallbackPerWorkerkUnitSubgraphkMergeUnitSubgraph
ProfileConfig
online[type:bool, default:true]: Profile online if true, offline if false.num_warmups[type:int, default:1]: The number of warmup runs before profile.num_runs[type:int, default:1]: The number of runs for profilecopy_computation_ratio[type:std::vector<int>, default:[30000, ...]]: The ratio of computation to input-output copy. Used for latency estimation. The size of the list should be the same as the number of devices.smoothing_factor[type:float, default:0.1]: The momentum to reflect current profiled data.<updateed_profile> = <smoothing_factor> * <curr_profile> + (1. - <smoothing_factor>) * <prev_profile>.profile_data_path[type:std::string, default:""]: The input path to the file for offline profile results. If not specified, this will be ignored and will not generate the result file.
PlannerConfig
schedule_window_size[type:int, default:INT_MAX]: The size of window that scheduler will use.schedulers[type:std::vector<SchedulerType>, required]: The types of schedulers. IfNschedulers are specified,Nqueues will be generated.cpu_mask[type:CPUMaskFlags, default:kAll]: CPU masks to set CPU affinity.log_path[type:std::string, default:""]: The output path to the file for planner’s log. If not specified, this will be ignored and will not generate the result file.
WorkerConfig
workers[type:std::vector<DeviceFlags>, default:[kCPU, kGPU, ...]]: The list of target devices. By default, one worker per device is generated.cpu_masks[type:std::vector<CPUMaskFlags>, default:[kAll, kAll, ...]]: CPU masks to set CPU affinity. The size of the list must be the same as the size ofworkers.num_threads[type:std::vector<int>, default:[1, 1, ...]]: The number of threads. The size of the list must be the same as the size ofworkers.allow_worksteal[type:bool, default:false]: Work-stealing is enabled if true, disabled if false.availability_check_interval_ms[type:int, default:30_000]: The interval for checking availability of devices. Used for detecting thermal throttling.
RuntimeConfig
RuntimeConfigcontainsProfileConfig,PlannerConfigandWorkerConfig.minimum_subgraph_size[type:int, default:7]: The minimum subgraph size. If candidate subgraph size is smaller than this, the subgraph will not be created.subgraph_preparation_type[type:SubgraphPreparationType, default:kMergeUnitSubgraph]: For fallback schedulers, determine how to generate candidate subgraphs.cpu_mask[type:CPUMaskFlags, default:kAll]: The CPU mask for Band Engine.
RuntimeConfigBuilder API
RuntimeConfigBuilder delegates all builder that inherits ConfigBuilder.
It is a friend class of all the other ConfigBuilder classes, so make sure to not change their members in RuntimeConfigBuilder.
Exmaple Usage
RuntimeConfigBuilder b;
auto config = b.AddOnline(false) // Default was `true`
.AddSmoothingFactor(0.3) // Default was `0.1`
.AddSchedulers({SchedulerType::kRoundRobin, SchedulerType::kLeastSlackTimeFirst}) // Required field.
.Build();
Methods
All Add* methods are idempotent, i.e. multiple calls behaves the same as a single call.
AddOnline(bool online)AddNumWarmups(int num_warmups)AddNumRuns(int num_runs)AddCopyComputationRatio(std::vector<int> copy_computation_ratio)AddSmoothingFactor(float smoothing_factor)AddProfileLogPath(std::string profile_data_path)AddPlannerLogPath(std::string planner_log_path)AddScheduleWindowSize(int schedule_window_size)AddSchedulers(std::vector<SchedulerType> schedulers)AddPlannerCPUMask(CPUMaskFlags cpu_masks)AddWorkers(std::vector<DeviceFlags> workers)AddWorkerCPUMasks(std::vector<CPUMaskFlags> cpu_masks)AddWorkerNumThreads(std::vector<int> num_threads)AddAllowWorkSteal(bool allow_worksteal)AddAvailabilityCheckIntervalMs(int32_t availability_check_interval_ms)AddMinimumSubgraphSize(int minimum_subgraph_size)AddSubgraphPreparationType(SubgraphPreparationType subgraph_preparation_type)AddCPUMask(CPUMaskFlags cpu_mask)