[Solved] Convert commonmark links to Headings with spaces to GitHub flavored markdown.
harsh3466 @ harsh3466 @lemmy.ml Posts 18Comments 531Joined 1 yr. ago
harsh3466 @ harsh3466 @lemmy.ml
Posts
18
Comments
531
Joined
1 yr. ago
Okay, here's the command and a breakdown. I broke down every part of the command, not because I think you are dumb, but because reading these can be complicated and confusing. Additionally, detailed breakdowns like these have helped me in the past.
The command:
The breakdown:
sed
- calls sed-r
- allows for the use of extended regular expressions-i
- edit the file given as an argument at the end of the command (note, thei
flag must follow ther
flag, or the extended regular expressions will not be evaluated)Now the regex piece by piece. This command has two substitution regex to break down the goals into managable chunks.
Expression one is to convert the markdown links to lowercase. That expression is:
's|]\(#.+\)|\L&|;
The goal of this expression is to find markdown links, and to ignore https links. In your post you indicate the markdown links all start with a
#
symbol, so we don't have to explicitly ignore the https as much as we just have to match all links starting with#
. Here's the breakdown:'
- begins the entire expression set. If you had to match the'
character in your expression you would begin the expression set with"
instead of'
.s|
- invoking find and replace (substitution). Note, Im using the|
as a separator instead of the/
for easier readability. Insed
, you can use just about any separator you want in your syntax]\(#
- This is how we find the link we want to work on. In markdown, every link is preceded by](
to indicate a closing of the link text and the opening of the actual url. In the expression, the(
is preceded by a\
because it is a special regex character. So\(
tellssed
to find an actual closing parentheses character. Finally the#
will be the first character of the markdown links we want to convert to lowercase, as indicated by your example. The inclusion of the#
insures no https links will be caught up in the processing..+
- this bit has two parts,.
and+
. These are two special regex characters. the.
tellssed
to find any character at all and the+
tells it to find the preceding character one or more times. In the case of.+
, it's tellingsed
to find one or more of any characters. You might think this will eat ALL of the text in the document and make it all lowercase, but it will not because of the next part of the regex.\)
- this tellssed
to find a closing parentheses. Like the opening parentheses, it is a special regex character and needs to be escaped with the backslash to tellsed
to find an actual closing parentheses character. This is what stops the command from converting the entire document to lowercase, because when you combine the previous bit with this bit like so.+\)
, you're tellingsed
to find one or more of any character UNTIL you find a closing parentheses.|
- This tellssed
we're done looking for text to match. The next bits are about how to modify/replace that text\L
- This tellssed
to convert the given text to all lowercase&
- This is the given text to modify. In this case the&
is a special mertacharacter that tellssed
to modify the entire pattern matched in the matching portion of the expression. So when the&
is preceded by the\L
, this tellssed
Take everything that was matched in the pattern matching expression and convert it to lowercase.;
- this tellssed
that this is the end of the first expression, and that more are coming.So all together, what this first expression does is: Find a closing bracket followed by an opening parentheses followed by a pound/hash symbol followed by one or more of any characters until finding a closing parentheses. Then convert that entire chunk of text to lowercase. Because symbols don't have case you can just convert the entire matched pattern to lowercase. If there were specific parts that had to be kept case sensitive, then you'd have to match and modify more precisely.
The next expression is pretty easy, UNLESS any of your https links also include the string
%20
:If no https links contain the
%20
string, then this will do the trick:s|%20|-|g'
s|
- again opens the expression tellingsed
wer're looking to substitute/modify text%20
- tellssed
to find exactly the character sequence%20
|
- ends the pattern matching portion of the expression-
- tellssed
to replace the matched pattern with the exact character-
|
- tellssed
that's the end of the modification instructionsg
- tellssed
to do this globally throughout the document. In other words, to find all occurrances of the string%20
and replace them with the string-
'
- tellssed
that is the end of the expression(s) to be evaluated.So all together, what this expression does is: Within the given document, find every occurrence of a percent symbol followed by the number two followed by the number zero and replace them with the dash character.
/path/to/somefile
- tellssed
what file to work on.Part of using regex is understanding the contents of your own text, and with the information and examples given, this should work. However, if the markdown links have different formatting patterns, or as mentioned any of the https links have the
%20
string in them, or other text in the document might falsely match, then you'd have to provide more information to get a more nuanced regex to match.Edit: clarified the use of the
&
metacharacter.Edit 2: clarified that the
+
metacharacter indicates finding the preceding character (or character set) one or more times.