OpenCV Tutorials

Mat - The Basic Image Container

The first thing you need to be familiar with is how OpenCV stores and handles images.

Mat is basically a class with two data parts: the matrix header (containing information such as the size of the matrix, the method used for storing, at which address is the matrix stored, and so on) and a pointer to the matrix containing the pixel values (taking any dimensionality depending on the method chosen for storing) . The matrix header size is constant, however the size of the matrix itself may vary from image to image and usually is larger by orders of magnitude.

What you need to remember from all this is that: * Output image allocation for OpenCV functions is automatic (unless specified otherwise). * You do not need to think about memory management with OpenCVs C++ interface. * The assignment operator and the copy constructor only copies the header. * The underlying matrix of an image may be copied using the cv::Mat::clone() and cv::Mat::copyTo() functions.

The real interesting part is that you can create headers which refer to only a subsection of the full data. For example, to create a region of interest (ROI) in an image you just create a new header with the new boundaries:

Mat D (A, Rect(10, 10, 100, 100) ); // using a rectangle
Mat E = A(Range::all(), Range(1,3)); // using row and column boundaries

Storing methods

This is about how you store the pixel values. You can select the color space and the data type used.

There are, however, many other color systems each with their own advantages:

  • RGB is the most common as our eyes use something similar, however keep in mind that OpenCV standard display system composes colors using the BGR color space (a switch of the red and blue channel).
  • The HSV and HLS decompose colors into their hue, saturation and value/luminance components, which is a more natural way for us to describe colors. You might, for example, dismiss the last component, making your algorithm less sensible to the light conditions of the input image.
  • YCrCb is used by the popular JPEG image format.
  • CIE L*a*b* is a perceptually uniform color space, which comes handy if you need to measure the distance of a given color to another color.

Creating a Mat object explicitly

Although Mat works really well as an image container, it is also a general matrix class. Therefore, it is possible to create and manipulate multidimensional matrices.

  • cv::Mat::Mat Constructor
Mat M(2,2, CV_8UC3, Scalar(0,0,255)); // row, column
cout << "M = " << endl << " " << M << endl << endl;
the data type to use for storing the elements
the number of channels per matrix point.

We have multiple definitions constructed according to the following convention:

CV_[The number of bits per item][Signed or Unsigned][Type Prefix]C[The channel number]

For instance, CV_8UC3 means we use unsigned char types that are 8 bit long and each pixel has three of these to form the three channels. This are predefined for up to four channel numbers. The cv::Scalar is four element short vector. Specify this and you can initialize all matrix points with a custom value. If you need more you can create the type with the upper macro, setting the channel number in parenthesis as you can see below.

  • Use C/C++ arrays and initialize via constructor
int sz[3] = {2,2,2};
Mat L(3,sz, CV_8UC(1), Scalar::all(0));

The upper example shows how to create a matrix with more than two dimensions. Specify its dimension, then pass a pointer containing the size for each dimension and the rest remains the same.


CV_8UC(1) 中 (1) 是什么意思?

  • cv::Mat::create function:
M.create(4,4, CV_8UC(2)); cout << “M = “<< endl << ” ” << M << endl << endl;

You cannot initialize the matrix values with this construction. It will only reallocate its matrix data memory if the new size will not fit into the old one.

  • MATLAB style initializer: cv::Mat::zeros , cv::Mat::ones , cv::Mat::eye . Specify size and data type to use:

    Mat E = Mat::eye(4, 4, CV_64F);
    cout << "E = " << endl << " " << E << endl << endl;
    Mat O = Mat::ones(2, 2, CV_32F);
    cout << "O = " << endl << " " << O << endl << endl;
    Mat Z = Mat::zeros(3,3, CV_8UC1);
    cout << "Z = " << endl << " " << Z << endl << endl;
  • For small matrices you may use comma separated initializers or initializer lists (C++11 support is required in the last case):

Mat C = (Mat_<double>(3,3) << 0, -1, 0, -1, 5, -1, 0, -1, 0);
cout << "C = " << endl << " " << C << endl << endl;
C = (Mat_<double>({0, -1, 0, -1, 5, -1, 0, -1, 0})).reshape(3);
cout << "C = " << endl << " " << C << endl << endl;


Mat_<double>() 是 C++ 的语法吗?

  • Create a new header for an existing Mat object and cv::Mat::clone or cv::Mat::copyTo it.
Mat RowClone = C.row(1).clone();
cout << "RowClone = " << endl << " " << RowClone << endl << endl;

Output formatting

cout << "R (default) = " << endl <<        R           << endl << endl;
cout << "R (python)  = " << endl << format(R, Formatter::FMT_PYTHON) << endl << endl;
cout << "R (csv)     = " << endl << format(R, Formatter::FMT_CSV   ) << endl << endl;
cout << "R (numpy)   = " << endl << format(R, Formatter::FMT_NUMPY ) << endl << endl;
cout << "R (c)       = " << endl << format(R, Formatter::FMT_C     ) << endl << endl;

How to scan images, lookup tables and time measurement with OpenCV


We’ll seek answers for the following questions:

  • How to go through each and every pixel of an image?
  • How is OpenCV matrix values stored?
  • How to measure the performance of our algorithm?
  • What are lookup tables and why use them?

Our test case

Let us consider a simple color reduction method. By using the unsigned char C and C++ type for matrix item storing, a channel of pixel may have up to 256 different values . For a three channel image this can allow the formation of way too many colors (16 million to be exact). Working with so many color shades may give a heavy blow to our algorithm performance. However, sometimes it is enough to work with a lot less of them to get the same final result.

In this cases it’s common that we make a color space reduction . This means that we divide the color space current value with a new input value to end up with fewer colors. For instance every value between zero and nine takes the new value zero, every value between ten and nineteen the value ten and so on.

\[I = \frac I {10} * 10\]

A simple color space reduction algorithm would consist of just passing through every pixel of an image matrix and applying this formula. It’s worth noting that we do a divide and a multiplication operation. These operations are bloody expensive for a system. If possible it’s worth avoiding them by using cheaper operations such as a few subtractions, addition or in best case a simple assignment. Furthermore, note that we only have a limited number of input values for the upper operation. In case of the uchar system this is 256 to be exact.

Therefore, for larger images it would be wise to calculate all possible values beforehand and during the assignment just make the assignment, by using a lookup table. Lookup tables are simple arrays (having one or more dimensions) that for a given input value variation holds the final output value. Its strength lies that we do not need to make the calculation, we just need to read the result.

Here we first use the C++ stringstream class to convert the third command line argument from text to an integer format. Then we use a simple look and the upper formula to calculate the lookup table. No OpenCV specific stuff here.

int divideWith = 0; // convert our input string to number - C++ style
stringstream s;
s << argv[2];
s >> divideWith;

Another issue is how do we measure time? Well OpenCV offers two simple functions to achieve this cv::getTickCount() and cv::getTickFrequency() . The first returns the number of ticks of your systems CPU from a certain event (like since you booted your system). The second returns how many times your CPU emits a tick during a second.

How is the image matrix stored in memory?

The size of the matrix depends on the color system used. More accurately, it depends from the number of channels used .

Because in many cases the memory is large enough to store the rows in a successive fashion the rows may follow one after another, creating a single long row . Because everything is in a single place following one after another this may help to speed up the scanning process. We can use the cv::Mat::isContinuous() function to ask the matrix if this is the case.

The efficient way

When it comes to performance you cannot beat the classic C style operator[] (pointer) access. Therefore, the most efficient method we can recommend for making the assignment is:

Mat& ScanImageAndReduceC(Mat& I, const uchar* const table)
    // accept only char type matrices
    CV_Assert(I.depth() == CV_8U);
    int channels = I.channels();
    int nRows = I.rows;
    int nCols = I.cols * channels;
    if (I.isContinuous())
        nCols *= nRows;
        nRows = 1;
    int i,j;
    uchar* p;
    for( i = 0; i < nRows; ++i)
        p = I.ptr<uchar>(i);
        for ( j = 0; j < nCols; ++j)
            p[j] = table[p[j]];
    return I;


I.ptr<uchar>(i) 中的 <uchar> 是什么语法?

We only need to request the pointer a single time and go all the way to the end.

uchar* cv::Mat::ptr( int i0 = 0 )
The methods return uchar* or typed pointer to the specified matrix row.
Returns a reference to the specified array element.
Transforms the source matrix into the destination matrix using the given look-up table.

The iterator (safe) method

In case of the efficient way making sure that you pass through the right amount of uchar fields and to skip the gaps that may occur between the rows was your responsibility. The iterator method is considered a safer way as it takes over these tasks from the user.

In case of color images we have three uchar items per column. This may be considered a short vector of uchar items, that has been baptized in OpenCV with the Vec3b name. To access the n-th sub column we use simple operator[] access. It’s important to remember that OpenCV iterators go through the columns and automatically skip to the next row. Therefore in case of color images if you use a simple uchar iterator you’ll be able to access only the blue channel values.

On-the-fly address calculation with reference returning

The final method isn’t recommended for scanning. It was made to acquire or modify somehow random elements in the image.

The functions takes your input type and coordinates and calculates on the fly the address of the queried item .

If you need to do multiple lookups using this method for an image it may be troublesome and time consuming to enter the type and the at keyword for each of the accesses. To solve this problem OpenCV has a cv::Mat_ data type. It’s the same as Mat with the extra need that at definition you need to specify the data type through what to look at the data matrix, however in return you can use the operator() for fast access of items. To make things even better this is easily convertible from and to the usual cv::Mat data type. A sample usage of this you can see in case of the color images of the upper function. Nevertheless, it’s important to note that the same operation (with the same runtime speed) could have been done with the cv::at() function. It’s just a less to write for the lazy programmer trick.

The Core Function

This is a bonus method of achieving lookup table modification in an image. In image processing it’s quite common that you want to modify all of a given image values to some other value. OpenCV provides a function for modifying image values, without the need to write the scanning logic of the image. We use the cv::LUT() function of the core module .

Mat lookUpTable(1, 256, CV_8U);
uchar* p = lookUpTable.ptr();
for( int i = 0; i < 256; ++i)
    p[i] = table[i];
LUT(I, lookUpTable, J);

Performance Difference

We can conclude a couple of things. If possible, use the already made functions of OpenCV (instead of reinventing these). The fastest method turns out to be the LUT function. This is because the OpenCV library is multi-thread enabled via Intel Threaded Building Blocks . However, if you need to write a simple image scan prefer the pointer method. The iterator is a safer bet, however quite slower. Using the on-the-fly reference access method for full image scan is the most costly in debug mode. In the release mode it may beat the iterator approach or not, however it surely sacrifices for this the safety trait of iteratorsu.

Mask operations on matrices

The idea is that we recalculate each pixels value in an image according to a mask matrix (also known as kernel). This mask holds values that will adjust how much influence neighboring pixels (and the current pixel) have on the new pixel value. From a mathematical point of view we make a weighted average, with our specified values.

Operations with images


Mat img = imread(filename)
Mat img = imread(filename, IMREAD_GRAYSCALE);
imwrite(filename, img);


  • imread: format of the file is determined by its content (first few bytes) Save an image to a file:
  • imwrite: format of the file is determined by its extension.
  • use imdecode and imencode to read and write image from/to memory rather than a file.

Basic operations with images

Accessing pixel intensity values

In order to get pixel intensity value, you have to know the type of an image and the number of channels . Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates x and y:

Scalar intensity =<uchar>(y, x);

intensity.val[0] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV images are represented by the same structure as matrices, we use the same convention for both cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or x-coordinate) follows it. Alternatively, you can use the following notation:

Scalar intensity =<uchar>(Point(x, y));


没有太懂这里,行坐标用 y,列坐标用 x 表示吗?

Now let us consider a 3 channel image with BGR color ordering (the default format returned by imread):

Vec3b intensity =<Vec3b>(y, x);
uchar blue = intensity.val[0];
uchar green = intensity.val[1];
uchar red = intensity.val[2];

You can use the same method for floating-point images (for example, you can get such an image by running Sobel on a 3 channel image):

Vec3f intensity =<Vec3f>(y, x);
float blue = intensity.val[0];
float green = intensity.val[1];
float red = intensity.val[2];

The same method can be used to change pixel intensities:<uchar>(y, x) = 128;

There are functions in OpenCV, especially from calib3d module, such as projectPoints, that take an array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be easily constructed from std::vector:

vector<Point2f> points;
//... fill the array
Mat pointsMat = Mat(points);

One can access a point in this matrix using the same method Mat::at :

Point2f point =<Point2f>(i, 0);

Memory management and reference counting

Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc) and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to the same data. A Mat keeps a reference count that tells if data has to be deallocated when a particular instance of Mat is destroyed. Here is an example of creating two matrices without copying data:

std::vector<Point3f> points;
// .. fill the array
Mat pointsMat = Mat(points).reshape(1);

As a result we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. pointsMat uses data from points and will not deallocate the memory when destroyed. In this particular instance, however, developer has to make sure that lifetime of points is longer than of pointsMat. If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone:


既然有引用计数,为什么还要开发者保证 points 的 lifetime is longer than of pointsMat ?


32FC1 means that each pixel value is stored as one channel floating point with single precision.

Mat img = imread("image.jpg");
Mat img1 = img.clone();

To the contrary with C API where an output image had to be created by developer, an empty output Mat can be supplied to each function. Each implementation calls Mat::create for a destination matrix. This method allocates data for a matrix if it is empty. If it is not empty and has the correct size and type, the method does nothing. If, however, size or type are different from input arguments, the data is deallocated (and lost) and a new data is allocated. For example:

Mat img = imread("image.jpg");
Mat sobelx;
Sobel(img, sobelx, CV_32F, 1, 0);

Primitive operations

There is a number of convenient operators defined on a matrix. For example, here is how we can make a black image from an existing greyscale image img:

img = Scalar(0);

Selecting a region of interest:

Rect r(10, 10, 100, 100);
Mat smallImg = img(r);

A conversion from Mat to C API data structures:

Mat img = imread("image.jpg");
IplImage img1 = img;
CvMat m = img;

Note that there is no data copying here.

Conversion from color to grey scale:

Mat img = imread("image.jpg"); // loading a 8UC3 image
Mat grey;
cvtColor(img, grey, COLOR_BGR2GRAY);

Change image type from 8UC1 to 32FC1:

src.convertTo(dst, CV_32F);

Visualizing images

It is very useful to see intermediate results of your algorithm during development process. OpenCV provides a convenient way of visualizing images. A 8U image can be shown using:

Mat img = imread("image.jpg");
namedWindow("image", WINDOW_AUTOSIZE);
imshow("image", img);

A call to waitKey() starts a message passing cycle that waits for a key stroke in the “image” window. A 32F image needs to be converted to 8U type. For example:

Mat img = imread("image.jpg");
Mat grey;
cvtColor(img, grey, COLOR_BGR2GRAY);
Mat sobelx;
Sobel(grey, sobelx, CV_32F, 1, 0);
double minVal, maxVal;
minMaxLoc(sobelx, &minVal, &maxVal); //find minimum and maximum intensities
Mat draw;
sobelx.convertTo(draw, CV_8U, 255.0/(maxVal - minVal), -minVal * 255.0/(maxVal - minVal));
namedWindow("image", WINDOW_AUTOSIZE);
imshow("image", draw);


TODO Sobel 是做什么用的?

Adding (blending) two images using OpenCV


In this tutorial you will learn:

  • what is linear blending and why it is useful;
  • how to add two images using addWeighted()


Note: The explanation below belongs to the book Computer Vision: Algorithms and Applications by Richard Szeliski

From our previous tutorial, we know already a bit of Pixel operators. An interesting dyadic (two-input) operator is the linear blend operator:

\[g(x) = (1−α)f_0(x) + αf_1(x)\]

By varying α from 0→1 this operator can be used to perform a temporal cross-dissolve between two images or videos, as seen in slide shows and film productions (cool, eh?)

Warning: Since we are adding src1 and src2, they both have to be of the same size (width and height) and type.

Now we need to generate the g(x) image. For this, the function addWeighted() comes quite handy:

beta = ( 1.0 - alpha );
addWeighted( src1, alpha, src2, beta, 0.0, dst);

since addWeighted() produces:


In this case, gamma is the argument 0.0 in the code above.

Changing the contrast and brightness of an image


In this tutorial you will learn how to:

  • Access pixel values
  • Initialize a matrix with zeros
  • Learn what cv::saturate_cast does and why it is useful
  • Get some cool info about pixel transformations
  • Improve the brightness of an image on a practical example

Image Processing

  • A general image processing operator is a function that takes one or more input images and produces an output image.
  • Image transforms can be seen as: - Point operators (pixel transforms) - Neighborhood (area-based) operators

Pixel Transforms

In this kind of image processing transform, each output pixel’s value depends on only the corresponding input pixel value (plus, potentially, some globally collected information or parameters).

Examples of such operators include brightness and contrast adjustments as well as color correction and transformations.

Brightness and contrast adjustments

Two commonly used point processes are multiplication and addition with a constant:


The parameters α>0 and β are often called the gain and bias parameters; sometimes these parameters are said to control contrast and brightness respectively.

Practical example

In this paragraph, we will put into practice what we have learned to correct an underexposed image by adjusting the brightness and the contrast of the image. We will also see another technique to correct the brightness of an image called gamma correction.

Increasing (/ decreasing) the β value will add (/ subtract) a constant value to every pixel. Pixel values outside of the [0 ; 255] range will be saturated (i.e. a pixel value higher (/ lesser) than 255 (/ 0) will be clamp to 255 (/ 0)).


In light gray, histogram of the original image, in dark gray when brightness = 80 in Gimp

The histogram represents for each color level the number of pixels with that color level. A dark image will have many pixels with low color value and thus the histogram will present a peak in his left part. When adding a constant bias, the histogram is shifted to the right as we have added a constant bias to all the pixels.

The α parameter will modify how the levels spread. If α<1, the color levels will be compressed and the result will be an image with less contrast.


Note that these histograms have been obtained using the Brightness-Contrast tool in the Gimp software. The brightness tool should be identical to the β bias parameters but the contrast tool seems to differ to the α gain where the output range seems to be centered with Gimp (as you can notice in the previous histogram).

It can occur that playing with the β bias will improve the brightness but in the same time the image will appear with a slight veil as the contrast will be reduced. The α gain can be used to diminue this effect but due to the saturation, we will lose some details in the original bright regions.

Gamma correction

Gamma correction can be used to correct the brightness of an image by using a non linear transformation between the input values and the mapped output values:

\[O = {\frac I 255}^γ × 255\]

As this relation is non linear, the effect will not be the same for all the pixels and will depend to their original value.


When γ<1, the original dark regions will be brighter and the histogram will be shifted to the right whereas it will be the opposite with γ>1.

Correct an underexposed image

The following image has been corrected with: α=1.3 and β=40.


By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons

The overall brightness has been improved but you can notice that the clouds are now greatly saturated due to the numerical saturation of the implementation used (highlight clipping in photography).

The following image has been corrected with: γ=0.4.


By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons

The gamma correction should tend to add less saturation effect as the mapping is non linear and there is no numerical saturation possible as in the previous method.


Left: histogram after alpha, beta correction ; Center: histogram of the original image ; Right: histogram after the gamma correction

The previous figure compares the histograms for the three images (the y-ranges are not the same between the three histograms). You can notice that most of the pixel values are in the lower part of the histogram for the original image. After α, β correction, we can observe a big peak at 255 due to the saturation as well as a shift in the right. After gamma correction, the histogram is shifted to the right but the pixels in the dark regions are more shifted (see the gamma curves figure) than those in the bright regions.

In this tutorial, you have seen two simple methods to adjust the contrast and the brightness of an image. They are basic techniques and are not intended to be used as a replacement of a raster graphics editor!

Discrete Fourier Transform


We’ll seek answers for the following questions:

  • What is a Fourier transform and why use it?
  • How to do it in OpenCV?
  • Usage of functions such as: copyMakeBorder() , merge() , dft() , getOptimalDFTSize() , log() and normalize() .

The Fourier Transform will decompose an image into its sinus and cosines components. In other words, it will transform an image from its spatial domain to its frequency domain. The idea is that any function may be approximated exactly with the sum of infinite sinus and cosines functions. The Fourier Transform is a way how to do this. Mathematically a two dimensional images Fourier transform is:


傅里叶变换将图像分解为其正弦和余弦分量。 换句话说,它会将图像从其空间域转换到其频域。 这个想法是任何函数都可以用无限的正弦函数和余弦函数的和来近似。

\[ \begin{align}\begin{aligned}F(k,l) = \sum_{i=0}^{N-1} \sum_{j=0}^{N−1} f(i,j) e^{-i2π({\frac {ki} N + \frac {lj} N})}\\e^{ix}=cosx+isinx\end{aligned}\end{align} \]

Here f is the image value in its spatial domain and F in its frequency domain. The result of the transformation is complex numbers. Displaying this is possible either via a real image and a complex image or via a magnitude and a phase image. However, throughout the image processing algorithms only the magnitude image is interesting as this contains all the information we need about the images geometric structure. Nevertheless, if you intend to make some modifications of the image in these forms and then you need to retransform it you’ll need to preserve both of these.


throughout the image processing algorithms only the magnitude image is interesting as this contains all the information we need about the images geometric structure. Nevertheless, TODO 没懂这句话

In this sample I’ll show how to calculate and show the magnitude image of a Fourier Transform. In case of digital images are discrete. This means they may take up a value from a given domain value. For example in a basic gray scale image values usually are between zero and 255. Therefore the Fourier Transform too needs to be of a discrete type resulting in a Discrete Fourier Transform (DFT) . You’ll want to use this whenever you need to determine the structure of an image from a geometrical point of view . Here are the steps to follow (in case of a gray scale input image I):

Expand the image to an optimal size

The performance of a DFT is dependent of the image size. It tends to be the fastest for image sizes that are multiple of the numbers two, three and five. Therefore, to achieve maximal performance it is generally a good idea to pad border values to the image to get a size with such traits. The getOptimalDFTSize() returns this optimal size and we can use the copyMakeBorder() function to expand the borders of an image (the appended pixels are initialized with zero):

Mat padded;                            //expand input image to optimal size
int m = getOptimalDFTSize( I.rows );
int n = getOptimalDFTSize( I.cols ); // on the border add zero values
copyMakeBorder(I, padded, 0, m - I.rows, 0, n - I.cols, BORDER_CONSTANT, Scalar::all(0));

Make place for both the complex and the real values

The result of a Fourier Transform is complex (复数). This implies that for each image value the result is two image values (one per component). Moreover, the frequency domains range is much larger than its spatial counterpart. Therefore, we store these usually at least in a float format. Therefore we’ll convert our input image to this type and expand it with another channel to hold the complex values:

Mat planes[] = {Mat_<float>(padded), Mat::zeros(padded.size(), CV_32F)};
Mat complexI;
merge(planes, 2, complexI);         // Add to the expanded another plane with zeros

这里 planes 中是两个尺寸相等、单通道图片,merge 后得到的 complexI 是一个同尺寸、双通道图片。


complex1 的实部为什么不初始化一个新的,而要复用 padded 的内存?另外复用内存也表示 DFT 可以原地计算吗?yes,下面给出了答案。

Make the Discrete Fourier Transform

It’s possible an in-place calculation (same input as output):

dft(complexI, complexI); // this way the result may fit in the source matrix

Transform the real and complex values to magnitude

A complex number has a real (Re) and a complex (imaginary - Im) part. The results of a DFT are complex numbers. The magnitude of a DFT is:

\[M = \sqrt {Re(DFT(I))^2 + Im(DFT(I))^2}\]

Translated to OpenCV code:

split(complexI, planes);                   // planes[0] = Re(DFT(I)), planes[1] = Im(DFT(I))
magnitude(planes[0], planes[1], planes[0]);// planes[0] = magnitude
Mat magI = planes[0];

Switch to a logarithmic scale

It turns out that the dynamic range of the Fourier coefficients is too large to be displayed on the screen. We have some small and some high changing values that we can’t observe like this. Therefore the high values will all turn out as white points, while the small ones as black. To use the gray scale values to for visualization we can transform our linear scale to a logarithmic one:


Translated to OpenCV code:

magI += Scalar::all(1);                    // switch to logarithmic scale
log(magI, magI);                           // void log(InputArray src, OutputArray dst)

Crop and rearrange

Remember, that at the first step, we expanded the image? Well, it’s time to throw away the newly introduced values. For visualization purposes we may also rearrange the quadrants of the result, so that the origin (zero, zero) corresponds with the image center.

// crop the spectrum, if it has an odd number of rows or columns
magI = magI(Rect(0, 0, magI.cols & -2, magI.rows & -2));
// rearrange the quadrants of Fourier image so that the origin is at the image center
int cx = magI.cols/2;
int cy = magI.rows/2;
Mat q0(magI, Rect(0, 0, cx, cy));   // Top-Left - Create a ROI per quadrant
Mat q1(magI, Rect(cx, 0, cx, cy));  // Top-Right
Mat q2(magI, Rect(0, cy, cx, cy));  // Bottom-Left
Mat q3(magI, Rect(cx, cy, cx, cy)); // Bottom-Right
Mat tmp;                           // swap quadrants (Top-Left with Bottom-Right)
q1.copyTo(tmp);                    // swap quadrant (Top-Right with Bottom-Left)

matI.cols & -2 如果是奇数-1,偶数不变。

cx, cy 是中间位置, q0 - q4 分别是矩阵切割成的四块,ROI 是 region of interest 的意思。


This is done again for visualization purposes. We now have the magnitudes, however this are still out of our image display range of zero to one. We normalize our values to this range using the cv::normalize() function.

normalize(magI, magI, 0, 1, NORM_MINMAX); // Transform the matrix with float values into a
                                        // viewable image form (float between values 0 and 1).


An application idea would be to determine the geometrical orientation present in the image. For example, let us find out if a text is horizontal or not? Looking at some text you’ll notice that the text lines sort of form also horizontal lines and the letters form sort of vertical lines. These two main components of a text snippet may be also seen in case of the Fourier transform. Let us use this horizontal and this rotated image about a text.

In case of the horizontal text:


In case of a rotated text:


You can see that the most influential components of the frequency domain (brightest dots on the magnitude image) follow the geometric rotation of objects on the image. From this we may calculate the offset and perform an image rotation to correct eventual miss alignments.


iPhone 照片点击编辑-旋转时,自动矫正位置的功能,是否也是用了这个原理?之前一直以为是找一根横线,囧。


TODO 如何应用懂了,但是不知道为什么傅立叶变换、交换ROI后会得到一张这样的图片。