Preprocessing

Overview

Band provides a set of APIs to preprocess the data. The preprocessing is mendatory and time-consuming process to run the machine learning model on the Band. Band provides Buffer and BufferProcessor to efficiently develop the preprocessing pipeline.

Buffer is an arbitrary wrapper of the data. It wraps the data with the metadata such as image, text, and Band Tensor to be used in the preprocessing pipeline. BufferProcessor is a set of APIs to preprocess the data. It provides the basic preprocessing APIs such as resize, normalize, and crop. We currently support image preprocessing only but we will support other data types such as text and audio in the future.

Example Usage - ImageProcessor

Below example shows how to use BufferProcessor to preprocess the image data. The example creates a Buffer from raw RGB data with (width, height) dimentions and (3) channels. Then, it creates an ImageProcessor with Resize and Normalize operations. It preprocesses the Buffer and updates a Tensor with (224, 224) dimentions and (3) channels with kFloat32 data type with the data normalized with (127.5, 127.5).

Tensor* tensor = ... // Create a tensor

// Create a buffer
unsigned char* data = new unsigned char[width * height * 3];
data = ... // Fill the data
Buffer* buffer = Buffer::CreateFromRaw(data, width, height, 3, BufferFormat::kRGB, DataType::kUInt8);

// Create an image processor
ImageProcessorBuilder builder;
builder.SetResize(224, 224);
builder.SetNormalize(127.5f, 127.5f);
absl::StatusOr<std::unique_ptr<BufferProcessor>> preprocessor =
      preprocessor_builder.Build();

// Preprocess the buffer
preprocessor->process(*buffer, *tensor);
... // Use the tensor

Buffer

Buffer can be created from following data types and metadata:

  • raw data, width, height, BufferFormat, DataType, and BufferOrientation (BufferFormat::kGrayScale, BufferFormat::kRGB, BufferFormat::kRGBA, and BufferFormat::kRaw only)
  • y plane, u plane, v plane, width, height, raw stride of y plane, raw stride of uv plane, pixel stride of uv plane, BufferFormat, DataType, and BufferOrientation (BufferFormat::kYV12, BufferFormat::kYV21, BufferFormat::kNV21, and BufferFormat::kNV12 only)
  • Tensor

Currently, BufferFormats that are not kRaw only support kUInt8 DataType.

Enumeration Types

  • BufferFormat

    • kGrayScale - 8-bit gray scale
    • kRGB - 8-bit RGB
    • kRGBA - 8-bit RGBA
    • kNV21 - YUV 4:2:0, 8 bit per channel, interleaved
    • kNV12 - YUV 4:2:0, 8 bit per channel, interleaved
    • kYV12 - YUV 4:2:0, 8 bit per channel, planar
    • kYV21 - YUV 4:2:0, 8 bit per channel, planar
    • kRaw - raw data
  • DataType

    • kNoType
    • kFloat32
    • kInt32
    • kUInt8
    • kInt64
    • kString
    • kBool
    • kInt16
    • kComplex64
    • kInt8
    • kFloat16
    • kFloat64

BufferProcessor

ImageProcessor

ImageProcessor supports following operations:

  • Crop(int x0, int y0, int x1, int y1): crop from top-left corner, inclusive
  • Resize(int width, int height): resize to a new size
  • Rotate(float angle): counter-clockwise, between 0 and 360 in multiples of 90
  • Flip(bool horizontal, bool vertical)
  • ConvertColorSpace(BufferFormat target_format): convert the color space
  • Normalize(float mean, float std)
  • DataTypeConvert(): convert the data type to the output data type, e.g., convert from 8-bit RGB to 32-bit float RGB (tensor).

ImageProcessorBuilder provides a simple way to create an ImageProcessor. The user predefines the operations and ImageProcessorBuilder will create an ImageProcessor with the operations.

By default, ImageProcessorBuilder without any operation will create a ImageProcessor provides a direct mapping from entire Buffer to Tensor without normalization. This covers the most common use case of the preprocessing.