The vast majority of the times I use ^/$, I actually want the behavior of matching start/end of lines. If I had some multi-line text, and only wanted to update or do something with the actual beginning or end of the entire text, I’d typically just do it manually.
In general no, you should not use match from the string. If you are getting input from user, you want a more complex processing (like stripping all whitespace), and if you are getting input from API calls, you want to either use specified name as-is, or fail.
Yes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...
Regular expressions as we basically now them today were made for ed. In that context, '$' absolutely had to match the terminating newline or it would've been completely useless.
I wish one of those regex libraries that replaces the regex symbols with human readable words would become standard. Or they don't work well?
Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.
I can't think of anywhere else in general programming where we have something so terse and symbol heavy.
It’s been done. Emacs, for example, has rx notation. From the manual:
35.3.3 The ‘rx’ Structured Regexp Notation
------------------------------------------
As an alternative to the string-based syntax, Emacs provides the
structured ‘rx’ notation based on Lisp S-expressions. This notation is
usually easier to read, write and maintain than regexp strings, and can
be indented and commented freely. It requires a conversion into string
form since that is what regexp functions expect, but that conversion
typically takes place during byte-compilation rather than when the Lisp
code using the regexp is run.
Here is an ‘rx’ regexp(1) that matches a block comment in the C
programming language:
(rx "/*" ; Initial /*
(zero-or-more
(or (not "*") ; Either non-*,
(seq "*" ; or * followed by
(not "/")))) ; non-/
(one-or-more "*") ; At least one star,
"/") ; and the final /
or, using shorter synonyms and written more compactly,
(rx "/*"
(* (| (not "*")
(: "*" (not "/"))))
(+ "*") "/")
In conventional string syntax, it would be written
"/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
Of course, it does have one disadvantage. As the manual says:
The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
most interactive situations where a regexp is requested, such as when
running ‘query-replace-regexp’ or in variable customization.
Raku also has advanced the state of the art considerably.
The vast majority of the times I use ^/$, I actually want the behavior of matching start/end of lines. If I had some multi-line text, and only wanted to update or do something with the actual beginning or end of the entire text, I’d typically just do it manually.
A lot of time I want to check for valid identifier:
as written, the code above is incorrect - it will happily accept "john\n", which can cause all sort of havoc down the lineShouldn't you use the match returned from the string? Or use .fullmatch() (added 3.4) to match the whole string.
In general no, you should not use match from the string. If you are getting input from user, you want a more complex processing (like stripping all whitespace), and if you are getting input from API calls, you want to either use specified name as-is, or fail.
Yes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...
Regular expressions as we basically now them today were made for ed. In that context, '$' absolutely had to match the terminating newline or it would've been completely useless.
I wish one of those regex libraries that replaces the regex symbols with human readable words would become standard. Or they don't work well?
Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.
I can't think of anywhere else in general programming where we have something so terse and symbol heavy.
It’s been done. Emacs, for example, has rx notation. From the manual:
Of course, it does have one disadvantage. As the manual says: Raku also has advanced the state of the art considerably.For this to matter, it seems that I would have to be in the situation of:
* running a regex not in multi-line mode
* on input that was presumably split from multiple lines, or within a line of multi-line input
* wherein I care whether the line in question is the last line of input without a trailing newline
* but I didn't check, or `.strip()` or anything
I can't say I recall ever being bitten by this.
And there is also nothing here to justify \A over ^.
[flagged]
This thread is about regular expressions in Python.
What you said is not wrong. Here's the article, in case you missed it
https://www.reuters.com/world/us/evidence-contradicts-trump-...
so why \A instead of ^?
\A always matches the start of the string, but in multiline mode, ^ will match both the start of the string and the start of each line:
https://docs.python.org/3/library/re.html#re.MULTILINE
it's in the spec. Since forever, like v 1.3? don't remember.
And it is same in perl: from `man perlre`:
I've said it before and I'll say it again, I'd like Python a lot more if it abandoned re and handled regex like perl did.
I've never used perl. What's the difference?
It doesn't need an import at all. It's just a normal part of the language's syntax and can be used just about anywhere:
The captures ($1, $2, etc.) are global and usable wherever you need them.In this particular case the default is that $ matches the end of a string without a newline but you can include it anytime you need to:
ABC: Always. Build on. Parser Combinators.
Python ecosystem has several options, for instance: https://parsy.readthedocs.io/en/latest/tutorial.html
They could simply advise to use boundaries '\b' instead.
Which would also match whitespace in addition to the \n they’re trying to avoid matching?