[Solved] Convert commonmark links to Headings with spaces to GitHub flavored markdown.
N0x0n @ N0x0n @lemmy.ml Posts 22Comments 774Joined 2 yr. ago
Hello :) I promise this is the last time I will bother you (I know what you are going to say :P) ! If it's not to much could you give me just a few hints on how I could improve a bit the final script?
#! /bin/bash files="/home/USER/projects/test.md" mdlinks="$(grep -Po ']\((?!https).*\)' "$files")" mdlinks2="$(grep -Po '#.*' <<<$mdlinks)" while IFS= read -r line; do #Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')" sed -i "s/$line/${dashlink}/" "$files" #Puts everything to lowercase after a hashtag lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')" sed -i "s/$dashlink/${lowercaselink}/" "$files" #Removes spaces (%20) from markdown links after a hashtag spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')" sed -i "s/$lowercaselink/${spacelink}/" "$files" done <<<"$mdlinks2"
This works perfectly en fulfills all my needs (thanks !!) ! However I'm not very fond of the variable string manipulation ($mdlinks2), if you have some tips without spoiling to much, would be great, otherwise it's okay, it works exactly how I have imagined it and ticks all use cases. Also If you could give some pointer for an overall improvement or if you see something that could potentially create some strange loop or looks off feel free to comment in your spare time :).
Another question which has nothing to do with the post and gets a bit off topic... You gave me the right push I needed and I saw the power and usefulness of proper knowledge with sed/bash/Pearl. It's time I finally learn a scripting language ! I want to hear your opinion on what tools would you recommend? Most people would say Python for beginners but I heard so much good things about Pearl (Exiftool is a good example of how powerful Pearl can be) but the syntax scares me out a little bit compared to Python.
Any good book material you have in mind for a beginner?
Thanks again for everything !!!
First, thanks again for sharing your knowledge with me I really appreciate the time/effort you took to write all of this. I know those are a lot of thank you :/ but I'm really grateful for all of this, this is very valuable information I will keep in my knowledge base. It's really time I learn proper bash/python/Pearl? scripting with all those tools (grep/sed/regex).
Second, YOU MISSED A DAMNED parentheses you fool xD ! mdlinks="$(grep -Po ']\((?!https).*\)' ~/mkdn)"
Took me some time to figured it out with a very non informative error bashscript.sh: line 8: unexpected EOF while looking for matching "'
but as expected it works !
From ------- [Just a test](#Just%20a%20test.md) [Just a link](https://mylink/%20with%20space.com) %20 To ------- [Just a test](#Just-a-test.md) [Just a link](https://mylink/%20with%20space.com) %20
Next to show you my appreciation and not to take everything for granted and being spoon feed for everything, I tried to find a solution myself for something else, I will try to explain the best I can how I solved it.
From ------- [Just a test](Another%20markdown%20file.md#Hello%20World) To ------- [Just a test](Another%20markdown%20file.md#hello-world)
The part before the hashtag needs to keep it's initial form (it links to the original markdown file). So, because just playing around with Pearl and regex (which doesn't end well doing this blindly without the proper knowledge) I did some simple string manipulation. It's not very elegant but does the trick, thankfully to your well written breakdown.
- I printed out the $mdlinks variable just to see what it prints out
- Copied and changed your Pearl/regex to find the first hashtag (#) and save it into a new variable ($mdlinks2)
- Feed your $mdlinks variable into my new Pearl/regex
- Feed my new variable into done? (I'm a bit confused here but okay xD)
#! /bin/bash mdlinks="$(grep -Po ']\((?!https).*\)' "/home/dany/newtest.md")" echo $mdlinks mdlinks2="$(grep -Po '#.*' <<<$mdlinks)" echo $mdlinks2 while IFS= read -r line; do dashlink="$(echo "$line" | sed 's|%20|-|g')" sed -i "s/$line/${dashlink}/" "/home/dany/newtest.md" done <<<"$mdlinks2"
Yes, not very elegant but It's the best I could do currently :/ However, I still got a YES effect :P
To answer your question:
Quick question as I’m working on this, in the new link example, is the BDMV and other capitalized text in this link supposed to be converted to lowercase, or to remain uppercase?
As you can see in my string manipulation above, the part before the # needs to keep it's original form :) (Sorry wasn't aware of this before working with the original files) I solved it with some string manipulation as shown above.
I'm a bit tired from all this searching/trail&error, tomorrow I will try to wrap everything up and answer your post below :) ! Also, I need to clean up the mess I made in my home directory xD.
Thanks again for your help ! Have a good night/day !
Hello !!!
Sorry for the very late response had something else to do. I will read everything carefully and response to every post :) I also thought about it over night and I think that sed and and regex wasn't the best option here (as other have mentioned it).
I think a python script or bash (as you have mentioned it a bit later ) would be a better way. I'm sorry that I put you through all of this... wrong tool for the job :s.
Heeey ! Take that back !! We aren't crazy, just different. Thats cool, just enjoy what ever you think is good for your mental health !
Just like some people love healthy meals and other can't live without processed food 😏
Sure :)
I don’t know if it still a thing but in the past some web URLs had spaces in their addresses e.g.
https://www.my/%20website%20with%20spaces.com
In markdown you can link to external web addresses like so
[some link to a web address](https://my/%20website%20with%20spaces.com) `
However, /https/ ! s|%20|-|g
replaces all occurrences of %20
(which is consider a space in html? Sorry if I’m wrong here :s still have a lot to learn) with -
. This would break the link the the web URL [some link to a web address](https://my-website-with-spaces.com/)
. Am I wrong here?
If I may I just found something else that doesn't quite work 😅 and it seems a bit harder to fix i think ! Sometimes I have links in this form:
[1.3 Subtitles](BDMV_svt-av1_encode_anime.md#1.3%20Subtitles)
As you can see I append the header with 1.3
but as dumb as it is... it also need to be 1-3-subtitles
e.g.
[1.3 Subtitles](BDMV_svt-av1_encode_anime.md#1.3%20Subtitles)
Needs to become
[1.3 Subtitles](BDMV_svt-av1_encode_anime.md#1-3-Subtitles)
Sorry for my bad English trying my best haha ! Hope it's comprehensible.
Edit:
I don't know why but lemmy add /%20
instead of %20
in my fake URLS ://
Haha we cross-replied !
.*
did the trick and removes my additional s|]\(.+#.+\)
to include that pattern form my last reply !
Last question https/ ! s|%20|-|
change all occurrence of %20
in the whole file except if it begins with https
, is there any way to just change that occurrence when it appears in the markdown link pattern []()
?
e.g. replace in [Some text](some%20text.md)
but not If Hello I'm just some%20place holder text
?
Thanks again for your easy to read and very informative walk through ! 🤩
Sorry to spam your unread message 😅 !
I played a bit around and came to the following conclusion:
s|]\(#.+\)|\L&|
- Works great for in document links so I further expanded to this s|]\(#.+\)|\L&|;s|]\(.+#.+\)|\L&|
to also add the following pattern [Some Text](readme.md#hello%20world.md)
s|%20|-|g
- Works on every occurrence of %20 even for the following pattern [Some text](https://my/%20home%20page.com)
which would break all external links to the web. So I used this /https/ ! s|%20|-|g
It's probably very sloppy what I'm doing and not as elegant as your command but it does the trick :) If you to further expand on it feel free however the following command does exactly what I wanted:
sed -re 's|]\(#.+\)|\L&|;s|]\(.+#.+\)|\L&|;/https/ ! s|%20|-|g'
Thanks again from the bottom of my heart !
Thank you, thank you very much for taking your time to help me out here ! I really appreciate your full breakdown and complete development ! I didn't tried it out yet but skimming through your post I'm sure it will work out !
However, I forgot to mention something:
The goal of this expression is to find markdown links, and to ignore https links. In your post you indicate the markdown links all start with a # symbol, so we don’t have to explicitly ignore the https as much as we just have to match all links starting with #.
This is only true for links in the same file, if i link to another file it look something like this:
[Why SVT-AV1 over AOM?](readme.md#Why%20SVT-AV1%20over%20AOM?)
I can try to wrap my head around and find a solution by myself, with your well written breakdown I'm sure I can try something out. But if you think it will be to complex for my limited knowledge feel free to adjust :).
Do you mind If I ping you if I'm not able to solve the issue?
Thank again !!!! 👍
Oupsi ! Forgot the 20 there ! 😅
Hello,
I have thought of a python script and looked a bit around but couldn't find something satisfactory. Also I'm a tiny bit more versed in bash/CLI than with python... Even though that's very arguable !
I looked through the Github repo and at first glance I have no idea how this could do the job, again I probably have to dig a bit deeper and understand what this is actually doing !
Thanks for the pointer will give it a try :)
This would be awesome ! A breakdown of the whole command will give me a better understanding !
Thank you in advance, waiting for your post :)
Hello :) Thanks for your reply !
That's exactly what I did and how I came to my "final" result but I doesn't work as expected... because the lack of knowledge and understanding !
Will give sd
a try and see if I can come up with something ! Thanks for the pointer !
I know that feeling ! My first service hosted via docker + Treafik outside my lan with a wireguard tunnel felt like a big dopamine hit ! Congrats !
Now I have over 20 services and It feels trivial :( I still love the easy to read/write syntax of Treafik ,however I feel like I'm missing a lot of important networking knowledge while avoiding Nginx !
Maybe one day when I'm too bored I will switch everything to Nginx, see how it goes !
From what I understand, F-droid regularly audits a few new apps for malicious code
That's a good point, but how can a malicious code be add to a source code from github? I mean if you only use trusted applications repos (most of them are already on f-droid anyway) there shouldn't be any concern right?
But reading from the link you posted there's some chance of a MITM attack and send a malicious payload directly to Obtainium? (Correct me if I'm wrong).
Github is not neccesarily the same source used to generate their binaries.
Didn't knew that :/
Thanks for sharing your knowledge !
Care to elaborate? I do not fully understand the meaning of your claim :/. I use Obtainium for everything and haven't had any issues until now.
Still curious from your perspective the meaning of what you said.
They have a working fix underway ! Check their github repo :)
That's some wiered/cool stuff I have ever heard off ! I have absolutely no idea what I'm looking at but somehow I want one of those !
That's the kind of cool niche stuff that is missing here on Lemmy !
Hello :) Sorry for the late response !!! I was busy working it out with another user ! However out of curiosity gave your sed regex a try, but there seems a missing
(
somewhere ! I tried to fix the issue but your regex is way over my capabilities ! If you are sed/regex fanatic a want to give it another try feel free :). Right now I found a solution with another user that works great here's the script in question if you are interested:It's not very elegant but it does the job... While working on it with another very friendly user I came across other thing I haven't though of like:
[Just a placeholder](#1.2%20Just%20a%20link%20to%20header)
)[Just a placeholder](Another%20File.md#1.2%20Just%20a%20link%20to%20header)
)[Just a placeholder](Another%20File.md#1-2-just-a-link-tp-header)
)Well I think that bare bone sed/regex wasn't the right tool, but in a bash script it does exactly what I'm expecting :)
Thanks for your help and pointers !