Memory management in MATLAB
Tyler Dare, 22 May 2015
Many high-channel-count measurements output files that are too large to load into memory at one time. Here are some strategies for loading, saving, and processing large files without crashing your system or generating OUT OF MEMORY errors.
Contents
Getting machine RAM programmatically
It can be helpful to get the RAM of your machine programmatically, especially if your code will be run on multiple systems.
[user, system] = memory; % Get memory statistics on your MATLAB session and the current system bytes = system.PhysicalMemory.Available; % Get the RAM (in bytes) that isn't being used by other programs available_RAM_GB = bytes/1024^3
available_RAM_GB = 9.4970
I have had good luck limiting variables to about 25% of available RAM.
Many times, subfunctions can create duplicate copies of variables temporarily and double RAM usage unexpectedly. For more information, see
http://www.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
Saving data as singles instead of doubles
Most data acquisition systems (including National Instruments) are 24-bit systems. Single-precision floating-point numbers are 32 bits, so no information will be lost if data is saved as singles instead of doubles, thereby cutting disk space usage in half.
You can initialize an array of singles using
S = zeros(1000,1000,'single');
Or if you want an array of complex singles (for example, from a CPSD matrix) you can use
S = zeros(1000,1000,'like',single(1i));
Load only parts of *.mat files
If the *.mat file you want to read has multiple large variables in it, you can read only some of them. First, find out what variables are in the file:
whos -file myDataFile
Name Size Bytes Class Attributes dataSet1 1000x1000 4000000 single dataSet2 1000x1000 4000000 single
Then for each large variable you can load it, do some processing, and clear it before moving on to the next:
load('myDataFile','dataSet1') m(1) = norm(dataSet1); clear dataSet1 load('myDataFile','dataSet2') m(2) = norm(dataSet2); clear dataSet2
Random access to *.mat files
With recent versions of MATLAB, you can do random access to *.mat files, which allows you to load only parts of variables. The *.mat file must be saved in the correct format.
Use the option 'v7.3' to save as the version that allows random access:
fs = 44100; t = 0:1/fs:100; x = sin(2*pi*100*t); % Generate 100 s long time history. save('longTimeHistory','fs','t','x','-v7.3') % Save the large data file using the v7.3 option. clear all
To load only part of the time history, first create a mat object
matObj = matfile('longTimeHistory')
matObj = matlab.io.MatFile Properties: Properties.Source: 'Z:\Notes\longTimeHistory.mat' Properties.Writable: false fs: [1x1 double] t: [1x4410001 double] x: [1x4410001 double]
Use the mat object like a structure to access the variables
fs = matObj.fs; % Read the sample rate dataChunk = matObj.x(1,1:fs); % Read the first second of the data into memory. timeChunk = matObj.t(1,1:fs); % Read the first second of the time vector into memory. Note that even though t is a vector, you must specify both dimensions.
mat objects can be read only in contiguous chunks of data, and using the 'end' keyword causes MATLAB to load the entire variable. For more information, see
http://www.mathworks.com/help/matlab/ref/matfile.html
One important note is that v7.3 files are row-major, instead of column-major like most MATLAB storage. This can make a make a big difference in the speed of loading variables.
A = rand(10000,10000); save('largeDataSet','A','-v7.3'); % Create a large variable and save it clear A matObj = matfile('largeDataSet'); for ii = 1:10 tic dataColumn = matObj.A(:,ii); % Load the first 10 columns of A tLoadColumn(ii) = toc; % Measure how long column reading takes tic dataRow = matObj.A(ii,:); % Load the first 10 rows of A tLoadRow(ii) = toc; % Measure how long row reading takes end averageColumnReadTime = mean(tLoadColumn) averageRowReadTime = mean(tLoadRow)
averageColumnReadTime = 2.6365 averageRowReadTime = 0.6252
For this case, row reading went significantly faster than column reading on average. If reading is much slower than you want it to be for large data sets, it may be worth it to transpose some of your large matrices so you can read them efficiently.