Mass editing text files too


The real reason why I bumped this post is for the following:

“Tor is written for and supported by people like you.”
“Tor jest pisany dla i wspierany przez ludzi takich, jak Ty.”
“Tor {is,wordt} geschreven voor en ondersteund door mensen zoals jou.”

Moet ik ‘m natuurlijk wel ‘up to date’ houden en niet na pas 2 maanden bijwerken. 😉
http://tor-relay.thruhere.net/tor/
http://tor-relay.thruhere.net/tor/dist/

This post picks up where I left off in my last post.

Let’s start with this one:
Insert (arbitrary text) into the middle of some text file.

#!/bin/bash
## Modified: Today by E.l.f.
#
## Script-name - insert_html.sh
# -> Commented out command.
## -> My comments/explanation(s).
RED="\033[0;31m"
BLUE="\033[1;34m"
CYAN="\033[1;36m"
YELLOW="\033[1;33m"
NC="\033[0m"
if [ $USER = root ]; then
  echo -e $RED"   Are you Insane!"
  echo -e $CYAN"    Error: In order to use this script, one must NOT be $USER"
  echo -e $YELLOW"    Exiting..."$NC
  exit 0
else
  echo ""
  echo -e $BLUE"    $USER may proceed."$NC
  echo -e $CYAN"    May peace be with you."$NC
fi
clear # Clear the screen.
## Could be replaced with a for arg in something; do ...
echo -n "Enter filename: "
read INPUT
INPUT=$INPUT
## There's no need to type the extension after the filename.  ; - )
## Just let the script take care of that.
INPUT+=".html"
## Use your "grep.txt"\!
# e.g. grep -in \<div\ class=\"main-column\"\> *.html >> grep.txt
## That last command will output the exact line number where you wish to insert
## something else.
if [ ! -f "$INPUT" ]; then
  echo ""
  echo -e $RED"Error: File doesn't exist!"$NC
  echo -e $YELLOW"Exiting..."$NC
  echo ""
  exit 0
fi
## Yep, smilies still don't work well within  delimiters on wordpress.
## But be(a)ware of the single quote!!!  : - D : - D
## See:
# echo 'Why can'\''t I write '"'"'s between single quotes'
## As an example in why this is almost impossible.
## This particular widget uses ' quotes, hence my warning to you. ; - )
## Must be an exact match [^] (this includes spaces et all.)
## i for edit inplace and a for append.
# sed -i '/^searchForSomeString/a \
# Append sometext to it' YourFile.someExt
#
## Your mileage may vary and so does your text perhaps/possibly. ; - )
## And yes spaces (10 in this case) not tabs (\t), my tabs convert into two spaces while editing.
## Author's preference nothing more.
## FOUND!  : - D
sed -i '/^        <th colspan="3" align="center">Advanced Bash-Scripting Guide:<\/th>/a \
      <\/tr>\
      <tr>\
        <td>&nbsp;</td>\
        <td width="80%" align="center" valign="bottom">\
          <div id="google_translate_element"></div>\
          <script type="text/javascript">\
          //<![CDATA[\
          function googleTranslateElementInit() {\
          new google.translate.TranslateElement({\
            pageLanguage: "en",\
            includedLanguages: "af,sq,ar,hy,az,eu,be,bg,ca,zh-CN,zh-TW,hr,cs,da,nl,en,et,tl,fi,fr,gl,ka,de,el,ht,iw,hi,hu,is,id,ga,it,ja,ko,lv,lt,mk,ms,mt,no,fa,pl,pt,ro,ru,sr,sk,sl,es,sw,sv,th,tr,uk,ur,vi,cy,yi",\
            layout: google.translate.TranslateElement.InlineLayout.HORIZONTAL\
          }, "google_translate_element");\
          }\
          //]]>\
          </script>\
          <script src=\
          "http://translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"\
          type="text/javascript">\
          </script>\
        </td>\
        <td>&nbsp;</td> ' "$INPUT"
exit 0

As the script used to be goes here (well at least a part of it):

# echo -n "Enter line-number here (e.g. 36, 41, 42): "
# read LINE
# LINE=$LINE
## Again choose carefully\!
# echo -n "Read the same file from line (usually this would be +1): "
# CNT=$(echo $(($LINE+1)))
## Just in case\!
# cp "$INPUT" "$INPUT".jic
# head --lines=$LINE "$INPUT" > head.html
# tail -n +$CNT "$INPUT" > tail.html
## The google.txt file is assumed to be in the WD,
## it could be any textfile you wish to insert though.
## Which in my case would be a simple translator's gadget.
## Adjust your spaces and tabulations (if applicable!?)
# cat google.txt >> head.html
# cat tail.html >> head.html
# \mv head.html "$INPUT"
# \rm tail.html

As you can see I’ve come a long way from just cutting up some piece of paper inserting some lines in and then gluing them back together (so to speak)! 😆 To me figuring out how one can indeed use sed to insert multiple lines, one must be weary of single ’s though. 😆 The top mentioned script features the google translate widget I meant to insert into a bunch of html files. An example can be viewed here. But this could be used for any purpose whatsoever, hence this publication. I must admit though that there (always) may be a quicker and dirtier way to do this? It still beats having to open an ‘n‘ amount of files and editing all this in manually :-o.

So now that is done with. On to mirroring tor.
Done!

How?
(Of if you prefer then go here.)
First we do:

rsync -av --delete rsync://rsync.torproject.org/tor tor-mirror/  ## I chose 'current' in this instance.

Then (Smallish update for today the 22nd of April.):

#cd into wd (working directory.
#Download the mirror.
rsync -av --delete rsync://rsync.torproject.org/tor tor-current/
cd tor-current
find -iname 'index.html' | wc -l ## None.
## thttpd sends these as 'plain/text'
find -iname 'index.html.en' | wc -l ## Just a few.
find -iname 'index.html.en' -exec ls -lh "{}" \; ## Where are they.
find -iname 'makefile' -exec ls -lh "{}" \; ## More of a necessity for the developer(s)
find -iname 'makefile' -exec \rm "{}" \; ## No need for those on a mirror.
find -iname 'makefile' -exec ls -lh "{}" \; ## Check.
find -iname 'en' -exec ls -lh "{}" \; ## These folders contain *.wml files.
find -iname 'en' -exec \rm -r "{}" \; ## -r because these are folders.
## Also removed any reference to the 'languageswitch' cgi script
## As well as any other shell script encountered.
history >> tor.s.history.txt ## Just to keep track of my own changes.  ; - )
## No need for GID but sticky? Yes.
find . -type d -exec chmod -s {} \; # Remove the setuid!
find . -type d -exec chmod 1755 {} \; # Folders are set to sticky.
## dist is gonna go on its own slice due to lack of space in /var/www.
# alias Bind='sudo mount -o bind' # /sourcefolderurl /destinationfolderurl
# ztar dist.tgz dist/ # alias ztar='tar cvzf ' # Name.tgz folder/
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
## In tor's wd.
## Please note that I consider it a safer approach to work in one directory at a time!
## This instead of:
# find -iname '*someName*' -exec someThing "{}" \;
cd about
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">@<a target="_blank" href="https://blog.torproject.org/blog/">@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">@<a target="_blank" href="http://printfection.com/torprojectstore">@' *.html
cd ../docs
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../donate
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../download
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../eff
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../getinvolved/
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../press
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../projects/
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd ../torbutton/
for i in *.html.en;do mv "$i" $(basename "$i" .html.en).html;done
sed -i 's@.html.en@.html@' *.html
sed -i 's@<h1 id="logo"><a href="../index.html">Tor</a></h1>@<h1 id="logo"><a target="_blank" href="https://www.torproject.org/">Tor</a></h1>@' *.html
sed -i 's@<li><a href="../index.html">Home</a></li>@<li><a href="../">Home</a></li>@' *.html
sed -i 's@<a href="https://blog.torproject.org/blog/">Blog</a>@<a target="_blank" href="https://blog.torproject.org/blog/">Blog</a>@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">Store</a>@<a target="_blank" href="http://printfection.com/torprojectstore">Store</a>@' *.html
cd..
#For all other foreign pages (repeated per directory):
#{ar,da,de,es,fa,fr,it,pl,ru}
for i in *.html.ar;do mv "$i" $(basename "$i" .html.ar).ar.html;done
for i in *.html.da;do mv "$i" $(basename "$i" .html.da).da.html;done
for i in *.html.de;do mv "$i" $(basename "$i" .html.de).de.html;done
for i in *.html.es;do mv "$i" $(basename "$i" .html.es).es.html;done
for i in *.html.fa;do mv "$i" $(basename "$i" .html.fa).fa.html;done
for i in *.html.fr;do mv "$i" $(basename "$i" .html.fr).fr.html;done
for i in *.html.it;do mv "$i" $(basename "$i" .html.it).it.html;done
for i in *.html.pl;do mv "$i" $(basename "$i" .html.pl).pl.html;done
for i in *.html.ru;do mv "$i" $(basename "$i" .html.ru).ru.html;done
# 2> /dev/null suppresses error messages about files being non existent and such clutter.
sed -i 's@.html.ar@.ar.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.da@.da.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.de@.de.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.es@.es.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.fa@.fa.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.fr@.fr.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.it@.it.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.pl@.pl.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.ru@.ru.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@.html.en@.html@' *.{ar,da,de,es,fa,fr,it,pl,ru}.html 2> /dev/null
sed -i 's@<a href="https://blog.torproject.org/blog/">@<a target="_blank" href="https://blog.torproject.org/blog/">@' *.html
sed -i 's@<a href="http://printfection.com/torprojectstore">@<a target="_blank" href="http://printfection.com/torprojectstore">@' *.html
# Oops, I forgot to update the English only pages! *hint do the English pages first!*
sed -i 's@.html.ar@.ar.html@' *.html 2> /dev/null
sed -i 's@.html.da@.da.html@' *.html 2> /dev/null
sed -i 's@.html.de@.de.html@' *.html 2> /dev/null
sed -i 's@.html.es@.es.html@' *.html 2> /dev/null
sed -i 's@.html.fa@.fa.html@' *.html 2> /dev/null
sed -i 's@.html.fr@.fr.html@' *.html 2> /dev/null
sed -i 's@.html.it@.it.html@' *.html 2> /dev/null
sed -i 's@.html.pl@.pl.html@' *.html 2> /dev/null
sed -i 's@.html.ru@.ru.html@' *.html 2> /dev/null

All in all I didn't adjust this mirror in the intended 15 minutes as can be read in the link below.
This is due mainly because this version features a lot of translations of which it is noteworthy to see a lot of them to be in Polish.
ofchmod # alias for chmod 644 all files.
find -iname '*.html' -exec sed -i 's@<title>Tor@<title>Please note that you'\''re viewing a local mirror - Tor@' "{}" \;

tidyup.sh  ## See my previous post about mass editing text.
https://bohemian0wildebeest.wordpress.com/2011/01/22/mass-editing-text-files-too/

The code looks daunting, doesn’t it? It really isn’t! 😉 If I subtract the amount of time necessary to figure things out (which IS the larger part and we don’t like to repeat ourselves!), this basically could be done in less than 10 minutes (minus the download time for rsync of course.) Basically I tracked my own progress while editing, which IS the smarter thing to do. Because next time I don’t have to do the figure it out again part, which I didn’t do last time. 😉

Until next time,

Alex

3 thoughts on “Mass editing text files too

  1. Pingback: Why does one want to streamline and optimize files anyway? « Bohemian Wildebeest's Blog

    • Changing your facebook layout? Why do I get the feeling this encompasses (your referring url) a bit more than simply adjusting a few bits and pieces of css files!?

      Like

Comments are closed.