Home > mrg > MRG_MIKE > mrg_dfsu_stats.m

mrg_dfsu_stats

PURPOSE ^

Calculates some basic statistics from objects in DFSU files.

SYNOPSIS ^

function [data] = mrg_dfsu_stats(chunk_size)

DESCRIPTION ^

 Calculates some basic statistics from objects in DFSU files.  

 INPUT
   chunk_size      An integer specifying the size of the 'chunks' to break 
                   the file up into.  Larger chunks may speed up runtime, 
                   but very large chunks may lead to out of memory errors.  
                   Defaults to 1000.

 OUTPUT
   Returns a MATLAB structure with 10 items: X, Y and Z data from the DFSU
   file plus the 7 statistics calculated.
   Writes a CSV file, derived from the MATLAB structure.
   Writes a DFSU file with the same spatial extent as the analysed file,
   but with a single timestep will be written as
   'inputfilename_itemselected_stats.dfsu'.

 REQUIREMENTS
   The DHI/MIKE Matlab toolbox 2011 (developed with v. 20110304)
   mrg_struct_to_csv.m function (assuming you want csv output, else it will be skipped)

 NOTES
   The function performs calculations from each cell in a DFSU file (i.e.
   an analysis through the time domain). Currently it calculates:
       - mean (using nanmean, i.e. ignoring delete values)
       - median (using nanmedian, i.e. ignoring delete values)
       - standard deviation (using nanstd. Uses n-1 as the denominator)
       - nanmin (using nanmin)
       - nanmax (using nanmax)
       - n (ignoring nans)
       - n_total (including nans - should be equal to number of timesteps)

 LICENCE
   Created by Daniel Pritchard (www.pritchard.co)
   Distributed under a creative commons CC BY-SA licence. See here:
   http://creativecommons.org/licenses/by-sa/3.0/

 DEVELOPMENT
   v 1.0   12/7/2012
           DP. Initial attempt and distribution. 
   v 1.2   19/7/2012
           DP. Now deals with delete values by using single() throughout 
           (use of double() causes issues with rounding).
           DP. Much better estimation of time remaining.
   v 1.3   14/02/2013
           DP. Documentation.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [data] = mrg_dfsu_stats(chunk_size)
0002 % Calculates some basic statistics from objects in DFSU files.
0003 %
0004 % INPUT
0005 %   chunk_size      An integer specifying the size of the 'chunks' to break
0006 %                   the file up into.  Larger chunks may speed up runtime,
0007 %                   but very large chunks may lead to out of memory errors.
0008 %                   Defaults to 1000.
0009 %
0010 % OUTPUT
0011 %   Returns a MATLAB structure with 10 items: X, Y and Z data from the DFSU
0012 %   file plus the 7 statistics calculated.
0013 %   Writes a CSV file, derived from the MATLAB structure.
0014 %   Writes a DFSU file with the same spatial extent as the analysed file,
0015 %   but with a single timestep will be written as
0016 %   'inputfilename_itemselected_stats.dfsu'.
0017 %
0018 % REQUIREMENTS
0019 %   The DHI/MIKE Matlab toolbox 2011 (developed with v. 20110304)
0020 %   mrg_struct_to_csv.m function (assuming you want csv output, else it will be skipped)
0021 %
0022 % NOTES
0023 %   The function performs calculations from each cell in a DFSU file (i.e.
0024 %   an analysis through the time domain). Currently it calculates:
0025 %       - mean (using nanmean, i.e. ignoring delete values)
0026 %       - median (using nanmedian, i.e. ignoring delete values)
0027 %       - standard deviation (using nanstd. Uses n-1 as the denominator)
0028 %       - nanmin (using nanmin)
0029 %       - nanmax (using nanmax)
0030 %       - n (ignoring nans)
0031 %       - n_total (including nans - should be equal to number of timesteps)
0032 %
0033 % LICENCE
0034 %   Created by Daniel Pritchard (www.pritchard.co)
0035 %   Distributed under a creative commons CC BY-SA licence. See here:
0036 %   http://creativecommons.org/licenses/by-sa/3.0/
0037 %
0038 % DEVELOPMENT
0039 %   v 1.0   12/7/2012
0040 %           DP. Initial attempt and distribution.
0041 %   v 1.2   19/7/2012
0042 %           DP. Now deals with delete values by using single() throughout
0043 %           (use of double() causes issues with rounding).
0044 %           DP. Much better estimation of time remaining.
0045 %   v 1.3   14/02/2013
0046 %           DP. Documentation.
0047 
0048 %% Go!
0049 %Chunk size...
0050 if ~exist('chunk_size', 'var')
0051     chunk_size = 1000;
0052 end
0053 
0054 % Load libraries
0055 NET.addAssembly('DHI.Generic.MikeZero.DFS');
0056 NET.addAssembly('DHI.Generic.MikeZero.EUM');
0057 import DHI.Generic.MikeZero.DFS.*;
0058 import DHI.Generic.MikeZero.DFS.dfsu.*;
0059 import DHI.Generic.MikeZero.*
0060 
0061 % Get file
0062 [filename,filepath] = uigetfile('*.dfsu','Select the .dfsu file to analyse');
0063 infile = [filepath,filename];
0064 dfsu_file = DfsFileFactory.DfsuFileOpen(infile);
0065 
0066 % Read some item information
0067 items = {};
0068 for i = 0:dfsu_file.ItemInfo.Count-1
0069     item = dfsu_file.ItemInfo.Item(i);
0070     items{i+1,1} = char(item.Name);
0071     items{i+1,2} = char(item.Quantity.Unit);
0072     items{i+1,3} = char(item.Quantity.UnitAbbreviation);
0073 end
0074 
0075 % Ask the user to select the item to process
0076 choice = menu(sprintf('Which of these objects do you want to process?'),items{:,1}, 'Ummm... None of the above!');
0077 if choice == 0 || choice == length(items(:,1))+1
0078     error('Do or do not.  There is no try.');
0079 else
0080     dfsu_item = choice; % A number delimiting the item position in the DFSU file.  The same as mat_item (i.e. no zero indexing).
0081 end
0082 
0083 nsteps = dfsu_file.NumberOfTimeSteps;
0084 nelements = dfsu_file.NumberOfElements;
0085 
0086 % Setup the output object
0087 data = struct();
0088 
0089 % Add some nicities, like X, Y, Z
0090 node_x = single(dfsu_file.X);
0091 node_y = single(dfsu_file.Y);
0092 node_z = single(dfsu_file.Z);
0093 ele_table = mzNetFromElmtArray(dfsu_file.ElementTable);
0094 [ele_X,ele_Y,ele_Z] = mzCalcElmtCenterCoords(ele_table,node_x,node_y,node_z);
0095 data.X = ele_X.';
0096 data.Y = ele_Y.';
0097 data.Z = ele_Z.';
0098 
0099 % Add (empty) objects for the stats
0100 data.mean = NaN(1,nelements, 'single');
0101 data.median = NaN(1,nelements, 'single');
0102 data.stdev = NaN(1,nelements, 'single');
0103 data.min = NaN(1,nelements, 'single');
0104 data.max = NaN(1,nelements, 'single');
0105 data.n = NaN(1,nelements, 'single');
0106 data.ntotal = NaN(1,nelements, 'single');
0107 
0108 % Pre-allocate some memory, and setup our sequence...
0109 temp_single_ts = NaN(1,nelements,'single');
0110 temp_el_chunk = NaN(chunk_size,nsteps,'single');
0111 seq = int32(1:chunk_size:nelements);
0112 seq_len = size(seq,2);
0113 seq(end+1) = int32(nelements);
0114 
0115 % Single delete value
0116 sDelete = single(1e-35);
0117 
0118 % Timings are for debugging and display only...
0119 timings = NaN(1,seq_len);
0120 
0121 % loc = Keeping track of the loctaion in the data file where we are at
0122 loc = 0;
0123 % OK this is ugly, but I think I implimented this after 4 beers at about
0124 % 10 pm sitting at a coffee table in a rented apartment in Denmark.  If you
0125 % wanted professional code you wouldn't rely on drunk, tired, algal
0126 % physiologists!
0127 
0128 for j=1:seq_len
0129     tic
0130     if j == seq_len
0131         %This is the last chunk.  Act accordingly.
0132         last_chunk_size = seq(j+1)-seq(j)+1;
0133         temp_el_chunk = ones(last_chunk_size,nsteps);
0134         for i=0:nsteps-1
0135             temp_single_ts(:) = single(dfsu_file.ReadItemTimeStep(dfsu_item,i).Data)';
0136             temp_el_chunk(:,i+1) = temp_single_ts(seq(j):seq(j+1));
0137         end
0138         
0139         % OK now we have a chunk.  We need to process it...
0140         nel_in_chunk = size(temp_el_chunk,1);
0141         disp(['Processing elements ', num2str(loc+1), ' to ' num2str(size(temp_el_chunk,1)+loc), ' of ', num2str(nelements)])
0142         
0143         % Convert delete values to NANs
0144         temp_el_chunk(temp_el_chunk==sDelete) = NaN;
0145         
0146         data.mean(loc+1:size(temp_el_chunk,1)+loc) = nanmean(temp_el_chunk,2);
0147         data.median(loc+1:size(temp_el_chunk,1)+loc) = nanmedian(temp_el_chunk,2);
0148         data.stdev(loc+1:size(temp_el_chunk,1)+loc) = nanstd(temp_el_chunk,0,2);
0149         data.min(loc+1:size(temp_el_chunk,1)+loc) = nanmin(temp_el_chunk,[],2);
0150         data.max(loc+1:size(temp_el_chunk,1)+loc) = nanmax(temp_el_chunk,[],2);
0151         data.n(loc+1:size(temp_el_chunk,1)+loc) = sum(~isnan(temp_el_chunk),2);
0152         if size(temp_el_chunk,2) ~= nsteps
0153             error('Somehow the total number of values does not equal the number of timesteps')
0154         end
0155         data.ntotal(loc+1:size(temp_el_chunk,1)+loc) = repmat(size(temp_el_chunk,2), size(temp_el_chunk,1), 1);
0156         
0157         loc = size(temp_el_chunk,1)+loc;
0158     else
0159         for i=0:nsteps-1
0160             temp_single_ts(:) = single(dfsu_file.ReadItemTimeStep(dfsu_item,i).Data)';
0161             temp_el_chunk(:,i+1) = temp_single_ts(seq(j):seq(j+1)-1);
0162         end
0163         % OK now we have a chunk.  We need to process it...
0164         nel_in_chunk = size(temp_el_chunk,1);
0165         disp(['Processing elements ', num2str(loc+1), ' to ' num2str(size(temp_el_chunk,1)+loc), ' of ', num2str(nelements)])
0166         
0167         % Convert delete values to NANs
0168         temp_el_chunk(temp_el_chunk==sDelete) = NaN;
0169         
0170         data.mean(loc+1:size(temp_el_chunk,1)+loc) = nanmean(temp_el_chunk,2);
0171         data.median(loc+1:size(temp_el_chunk,1)+loc) = nanmedian(temp_el_chunk,2);
0172         data.stdev(loc+1:size(temp_el_chunk,1)+loc) = nanstd(temp_el_chunk,0,2);
0173         data.min(loc+1:size(temp_el_chunk,1)+loc) = nanmin(temp_el_chunk,[],2);
0174         data.max(loc+1:size(temp_el_chunk,1)+loc) = nanmax(temp_el_chunk,[],2);
0175         data.n(loc+1:size(temp_el_chunk,1)+loc) = sum(~isnan(temp_el_chunk),2);
0176         if size(temp_el_chunk,2) ~= nsteps
0177             error('Somehow the total number of values does not equal the number of timesteps')
0178         end
0179         data.ntotal(loc+1:size(temp_el_chunk,1)+loc) = repmat(size(temp_el_chunk,2), size(temp_el_chunk,1), 1);
0180 
0181         loc = size(temp_el_chunk,1)+loc;
0182         
0183     end
0184     timings(j) = toc;
0185     mean_time_sec = nanmean(timings);
0186     remain = (seq_len-j)*mean_time_sec/60;
0187     disp(['Approximatly ', num2str(remain), ' minutes remain.'])
0188 end
0189 
0190 %% Here we write out a CSV file
0191 if exist('mrg_struct_to_csv.m', 'file') == 2
0192     disp('Writing statistics to a CSV file.')
0193     out_file = [filename(1:end-5),'_',regexprep(items{choice,1},' ',''),'_stats.csv'];
0194     out_file_full = [filepath,out_file];
0195     mrg_struct_to_csv(data, out_file_full);
0196     disp(['Output CSV file (', out_file, ') written successfully.'])
0197 else
0198     disp('mrg_struct_to_csv.m not avilable in your search path.  No CSV file will be generated.')
0199 end
0200 
0201 %% Here we  stick data into a DFSU file
0202 % First DRAFT of the DFSU - based output...  Seems to work OK...
0203 disp('Writing statistics to a DFSU file.')
0204 
0205 % Apparently we need to load these again...  Hmmm... TODO: Figure this out
0206 NET.addAssembly('DHI.Generic.MikeZero');
0207 import DHI.Generic.MikeZero.*
0208 import DHI.Generic.MikeZero.EUM.*
0209 import DHI.Generic.MikeZero.DFS.*;
0210 import DHI.Generic.MikeZero.DFS.dfsu.*;
0211 
0212 builder = DfsuBuilder.Create(DfsuFileType.Dfsu2D);
0213 builder.SetTimeInfo(dfsu_file.StartDateTime, 1);
0214 builder.SetNodes(dfsu_file.X,dfsu_file.Y,dfsu_file.Z,dfsu_file.Code);
0215 builder.SetElements(dfsu_file.ElementTable);
0216 builder.SetProjection(dfsu_file.Projection);
0217 
0218 data_names = fieldnames(data);
0219 data_names(strmatch('X', data_names, 'exact'))=[];
0220 data_names(strmatch('Z', data_names, 'exact'))=[];
0221 data_names(strmatch('Y', data_names, 'exact'))=[];
0222 
0223 % Being lazy here - no need to write out X, Y and Z - TODO: Remove this...
0224 undef = eumQuantity(eumItem.eumIItemUndefined,eumUnit.eumUUnitUndefined);
0225 
0226 for n = 1:length(data_names)
0227     builder.AddDynamicItem(data_names{n},undef);
0228 end
0229 
0230 out_file = [filename(1:end-5),'_',regexprep(items{choice,1},' ',''),'_stats', filename(end-4:end)];
0231 out_file_full = [filepath,out_file];
0232 
0233 dfs_out = builder.CreateFile(out_file_full);
0234 
0235 if ~(size(data.(data_names{1}), 2) == dfs_out.NumberOfElements)
0236     error(['Somehow there is a mismatch in the number of elements in the ',...
0237         'DFSU output file and the length of the data.  This is an error'])
0238 end
0239 
0240 % Follow up from above - Re: Layzness
0241 for n = 1:length(data_names)
0242     data_out = single(data.(data_names{n}));
0243     data_out(isnan(data_out)) = sDelete;
0244     dfs_out.WriteItemTimeStepNext(n-1, NET.convertArray(data_out));
0245 end
0246 
0247 dfs_out.Close()
0248 disp(['Output DFSU file (', out_file, ') written successfully.'])
0249 
0250 end
0251 
0252 
0253 
0254

Generated on Thu 29-May-2014 21:29:53 by m2html © 2005