Skip to content Skip to sidebar Skip to footer

How Do I Use A Python Regex To Match The Function Syntax Of Matlab?

I am trying to find all the inputs/outputs of all MATLAB functions in our internal library. I am new (first time) to regex and have been trying to use the multiline mode in Python'

Solution 1:

The peculiar (internal) error you're getting should come if you pass re.T instead of re.M as the second argument to re.compile (re.template -- a currently undocumented entry -- is the one intended to use it, and, in brief, template REs don't support repetition or backtracking). Can you print re.M to show what's its value in your code before you call this re.compile?

Once that's fixed, we can discuss the details of your desired RE (in brief: if the input part can include parentheses you're out of luck, otherwise re.DOTALL and some rewriting of your pattern should help) -- but fixing this weird internal error occurrence seems to take priority.

Edit: with this bug diagnosed (as per the comments below this Q), moving on to the OP's current question: the re.DOTALL|re.MULTINE, plus the '$' at the end of the pattern, plus the everywhere-greedy matches (using .*, instead of .*? for non-greedy), all together ensure that if the regex matches it will match as broad a swathe as possible... that's exactly what this combo is asking for. Probably best to open another Q with a specific example: what's the input, what gets matched, what would you like the regex to match instead, etc.

Solution 2:

Here's a regular expression that should match any MATLAB function declaration at the start of an m-file:

^\s*function\s+((\[[\w\s,.]*\]|[\w]*)\s*=)?[\s.]*\w+(\([^)]*\))?

And here's a more detailed explanation of the components:

^\s*             # Match 0 or more whitespace characters#    at the startfunction# Match the word function
\s+              # Match 1 or more whitespace characters
(                # Start grouping 1
 (               # Start grouping 2
  \[             # Match opening bracket
  [\w\s,.]*      # Match 0 or more letters, numbers,#    whitespace, underscores, commas,#    or periods...
  \]             # Match closing bracket
  |[\w]*         # ... or match 0 or more letters,#    numbers, or underscores
 )               # End grouping 2
 \s*             # Match 0 or more whitespace characters
 =               # Match an equal sign
)?               # End grouping 1; Match it 0 or 1 times
[\s.]*           # Match 0 or more whitespace characters#    or periods
\w+              # Match 1 or more letters, numbers, or#    underscores
(                # Start grouping 3
 \(              # Match opening parenthesis
 [^)]*           # Match 0 or more characters that#    aren't a closing parenthesis
 \)              # Match closing parenthesis
)?               # End grouping 3; Match it 0 or 1 times

Whether you use regular expressions or basic string operations, you should keep in mind the different forms that the function declaration can take in MATLAB. The general form is:

function [out1,out2,...] = func_name(in1,in2,...)

Specifically, you could see any of the following forms:

function func_name                 %# No inputs or outputsfunction func_name(in1)            %# 1 inputfunction func_name(in1,in2)        %# 2 inputsfunction out1 = func_name          %# 1 outputfunction [out1] = func_name        %# Also 1 outputfunction [out1,out2] = func_name   %# 2 outputs
...

You can also have line continuations (...) at many points, like after the equal sign or within the argument list:

functionout1 = ...
    func_name(in1,...
              in2,...
              in3)

You may also want to take into account factors like variable input argument lists and ignored input arguments:

function func_name(varargin)       %# Any number of inputs possiblefunction func_name(in1,~,in3)      %# Second of three inputs is ignored

Of course, many m-files contain more than 1 function, so you will have to decide how to deal with subfunctions, nested functions, and potentially even anonymous functions (which have a different declaration syntax).

Solution 3:

how about normal Python string operations? Just an example only

for line in open("file"):
    sline=line.strip()
    if sline.startswith("function"):
       lhs,rhs =sline.split("=")
       out=lhs.replace("function ","")
       if "[" in out and "]" in out:
          out=out.replace("]","").replace("[","").split(",")
       print out
       m=rhs.find("(")
       if m!=-1:
          rhs=rhs[m:].replace(")","").replace("(","").split(",")           
       print rhs

output example

$ cat file
function [mean,stdev] = stat(x)
n = length(x);
mean = sum(x)/n;
stdev = sqrt(sum((x-mean).^2/n));
function mean = avg(x,n)
mean = sum(x)/n;
$ python python.py
['mean', 'stdev ']
[' statx']
mean
[' avgx', 'n']

Of course, there should be many other scenarios of declaring functions in Matlab, like function nothing, function a = b etc , so add those checks yourself.

Post a Comment for "How Do I Use A Python Regex To Match The Function Syntax Of Matlab?"