Mass editing text files.


As a first.

Google is your friend
Second:
As if this weren’t obvious!? I am still LEARNING and as such will arrive at different conclusions from time to time. See here for more on how I view this or better put how the author of this fine article defines this! :-)

Q: Why the need for such tools?
A: I maintain/manage a relatively large amount of self study material aka mirroring a few sites that have (once peaked) my interest. And because “I am that guy” I allow you to peek at those as well.

Third and last: …

All others just pop open the hood and watch and learn (as if? :lol:)

On to the scripts:

Tidy Up script goes here:

#!/bin/bash
# Modified: Today by E.L.F.
#
## This program is free software; you can redistribute it and/or modify it under
## the terms of the GNU General Public License as published by the Free Software
## Foundation; either version 2 of the License, or (at your option) any later
## version.
#
## This program is distributed in the hope that it will be useful, but WITHOUT
## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
## FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
## details.
#
## You should have received a copy of the GNU General Public License along with
## this program; if not, write to the Free Software Foundation, Inc., 51
## Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
## http://www.gnu.org/copyleft/gpl.html
#
## Script-name - tidyup.sh
## Requires tidy and optionally tidy-doc
# apt-cache search tidy
# sudo apt-get install tidy tidy-doc
## Or if you use shorthand as I do. ; - )
# aptsearch tidy tidy-doc
# Install tidy tidy-doc
## Needs the following lines to be present in $HOME/.bash_aliases though.
## To search
# alias aptsearch='apt-cache search '
## When an exact match show package info for.
# alias aptshow='apt-cache show '
## To install
# alias Install='echo "sudo apt-get install " && sudo apt-get install'
RED="\033[0;31m"
BLUE="\033[1;34m"
CYAN="\033[1;36m"
YELLOW="\033[1;33m"
NC="\033[0m"
if [ $USER = root ]; then
  echo -e $RED"   Are you Insane!"
  echo -e $CYAN"   Error: In order to use this script, one must NOT be $USER"
  echo -e $YELLOW"    Exiting..."$NC
  exit 0
else
  echo ""
  echo -e $BLUE"    $USER may proceed."
  echo -e $CYAN"    May peace be with you."$NC
fi
clear
H="html.txt"
find -iname '*.html' | sort > $H
## This scheme ignores filenames%20with%20ugly%20spaces completely. Sweet!
cat $H | while read line; do
I=$(echo ${line})
## I use $HOME/.usr/bin contrary to proper Ubuntu usage whereas they use
## simply $HOME/bin for user land scripts\!
config=$(echo "$HOME/.usr/bin/etc/config.txt")
tidy -config $config "$I"
done
rm $H
exit 0

The config.txt the previous script referred to goes here:

// sample config file for HTML tidy
doctype: transitional
indent: auto
indent-spaces: 2
wrap: 76
markup: yes
bare: yes
clean: yes
preserve-entities: yes
output-xml: no
input-xml: no
output-xhtml: yes
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: utf8
input-encoding: utf8
output-bom: auto
output-encoding: utf8
new-inline-tags: cfif, cfelse, math, mroot,
  mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
  munder, mover, mmultiscripts, msup, msub, mtext,
  mprescripts, mtable, mtr, mtd, mth
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
repeated-attributes: keep-last
error-file: errs.txt
write-back: yes

The eradicate empty lines and white space from file(s) script goes here:

#!/bin/bash
# Modified: Today by E.L.F.
#
## This program is free software; you can redistribute it and/or modify it under
## the terms of the GNU General Public License as published by the Free Software
## Foundation; either version 2 of the License, or (at your option) any later
## version.
#
## This program is distributed in the hope that it will be useful, but WITHOUT
## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
## FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
## details.
#
## You should have received a copy of the GNU General Public License along with
## this program; if not, write to the Free Software Foundation, Inc., 51
## Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
## http://www.gnu.org/copyleft/gpl.html
#
## Script-name - empty.lines.sh
#
## Q: Why not configure tidy to do exactly that!?
## A: I may want to use this on other files of type plain/text,
## of which afaik tidy doesn't parse.
## And why should tidy parse other files of type plain/text anyway?
RED="\033[0;31m"
BLUE="\033[1;34m"
CYAN="\033[1;36m"
YELLOW="\033[1;33m"
NC="\033[0m"
if [ $USER = root ]; then
  echo -e $RED"   Are you Insane!"
  echo -e $CYAN"   Error: In order to use this script, one must NOT be $USER"
  echo -e $YELLOW"    Exiting..."$NC
  exit 0
else
  echo ""
  echo -e $BLUE"    $USER may proceed."
  echo -e $CYAN"    May peace be with you."$NC
fi
clear
  ## Removing nonsense encoding schemes.
  ## Insert your own strings here.
#find -iname '*.html' -exec sed -i 's/<?xml version="1.0" encoding="utf-8" standalone="no"?>//' "{}" \;
#find -iname '*.html' -exec sed -i 's/<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>//' "{}" \;
#find -iname '*.html' -exec sed -i 's@/glossary/@@' "{}" \;
  ## Removing the excess whitespace (if applicable?) for you.
  # sed -i 's/[ \t]*$//' # see note on '\t' at end of file
## Of course I could use a case-select scenario for several filetypes as well.  
find -iname '*.html' -exec sed -i 's/[ \t]*$//' "{}" \;
  ## Removing those empty lines for you.
find -iname '*.html' -exec sed -i '/^$/d' "{}" \;
#   \rm *.bak
exit 0

Compress, remove and link file(s) script?
One can expect a lot of file not found errors and other crap when using the following script!
Because it is FAR from complete but I am using it anyway.
You’ve been properly warned!!

No, it won’t do that anymore. I’ve added a nice “fun roll loop” to the script. Be aware though that the user still needs to know in advance which types of files to choose from (of which I assume that you do.)!

But!

Please note that I already know one can achieve the same thing with the following three lines.

# for i in {*.html,*.js,*.css,*.txt,*.xml}; do gzip -c --best $i > $i.gz;done
# for i in {*.html,*.js,*.css,*.txt,*.xml}; do \rm $i;done
# for i in {*.html.gz,*.js.gz,*.css.gz,*.txt.gz,*.xml.gz}; do ln -s -T "$i" $(basename "$i" .gz);done

Though I can see a form of elegance as opposed to the rather elaborate ‘function’ “my” script now has become. :-) In that it requires no less than three lines to achieve the goal I had set to mind. *YaY* The problem with the above mentioned tactic is, that it is rather “brute force”. “Hence the del’ed warning” and will leave the user with, depending on the amount of files processed, with a lot of empty files and links to those files. Now on my system this won’t have any adverse effects, but I don’t know how this would be on yours (think filesystem for example.)!?

Aside from me wanting to keep things SIMPLE. ;-) It would be neat as well, if this script doesn’t exit when a file simply doesn’t exist (or leaves you with empty files), but simply returns back up something like ‘goto top’ for example??? Will work on that later! Both on the script as well as my formatting skills. ;-) (Note to self: Must close every html element properly. :lol:

Has been moved to there:

#!/bin/bash
## Modified: Today by E.l.f.
#
## Script-name - ziprmlink.sh
#
## Source:
# http://tldp.org/LDP/abs/html/loops1.html
RED="\033[0;31m"
BLUE="\033[1;34m"
CYAN="\033[1;36m"
YELLOW="\033[1;33m"
NC="\033[0m"
if [ $USER = root ]; then
  echo -e $RED"   Are you Insane!"
  echo -e $CYAN"    Error: In order to use this script, one must NOT be $USER"
  echo -e $YELLOW"    Exiting..."$NC
  exit 0
else
  echo ""
  echo -e $BLUE"    $USER may proceed."
  echo -e $CYAN"    May peace be with you."$NC
fi
clear # Clear the screen.
TIME=1
RePeat ()
{ # A somewhat more complex function.
a=0
## Because we only have 5 possible selections.
REPEATS=5
sleep $TIME    # Hey, wait a second!
  while [ $a -lt $REPEATS ]
    do
      echo "-------------------Mass Edit Options List-------------------"
      echo
      echo "    Please choose one of the following options:              "
      echo "                              Or                             "
      echo "    Hit Ctrl+c (^c) to stop now."
      echo "    \"UPPER\", \"lower\" and \"Capitalized\" spelling "
      echo "    are valid forms of input."
      echo
      echo "    Usage: "
      echo "    html"
      echo "    text"
      echo "    js"
      echo "    xml"
      echo "    css"
      echo
      echo "-------------------------------------------------------------"
      echo ""
      read Choice
      case "$Choice" in
      "HTML" | "Html" | "html")
      echo -e $BLUE"    You've chosen ."$Choice" file(s) to work on."$NC
      echo -e $RED"Executing..."$NC
      for i in *.html; do gzip -c --best $i > $i.gz;done
      for i in *.html; do \rm $i;done
      for i in *.html.gz; do ln -s -T "$i" $(basename "$i" .gz);done
      ;;
      "TXT" | "Txt" | "txt" | "TEXT" | "Text" | "text")
      echo -e $BLUE"    You've chosen ."$Choice" file(s) to work on."$NC
      echo -e $RED"Executing..."$NC
      for i in *.txt; do gzip -c --best $i > $i.gz;done
      for i in *.txt; do \rm $i;done
      for i in *.txt.gz; do ln -s -T "$i" $(basename "$i" .gz);done
      ;;
      "CSS" | "Css" | "css")
      echo -e $BLUE"    You've chosen ."$Choice" file(s) to work on."$NC
      echo -e $RED"Executing..."$NC
      for i in *.css; do gzip -c --best $i > $i.gz;done
      for i in *.css; do \rm $i;done
      for i in *.css.gz; do ln -s -T "$i" $(basename "$i" .gz);done
      ;;
      "XML" | "Xml" | "xml")
      echo -e $BLUE"    You've chosen ."$Choice" file(s) to work on."$NC
      echo -e $RED"Executing..."$NC
      for i in *.xml; do gzip -c --best $i > $i.gz;done
      for i in *.xml; do \rm $i;done
      for i in *.xml.gz; do ln -s -T "$i" $(basename "$i" .gz);done
      ;;
      "JS" | "Js" | "js")
      echo -e $BLUE"    You've chosen ."$Choice" file(s) to work on."$NC
      echo -e $RED"Executing..."$NC
      for i in *.js; do gzip -c --best $i > $i.gz;done
      for i in *.js; do \rm $i;done
      for i in *.js.gz; do ln -s -T "$i" $(basename "$i" .gz);done
      ;;
      *)
      echo
      echo "Please choose a valid option."
      echo "\"UPPER\", \"lower\" and \"Capitalized\" spelling are supported"
      echo "    Usage: "
      echo "    html"
      echo "    text"
      echo "    js"
      echo "    xml"
      echo "    css"
      echo
      ;;
      esac
    let "a+=1"
  done
}
## Repeat 5 times which for the user would depend on the need or amount of
## filetypes that need to be processed.
RePeat
exit 0

While the above mentioned script loops at ye 5 times (or until you hit ^c/ctrl+c) and is as such pretty handy. Why not have a “one command to rule them all”. Well here it is (almost). :lol: I got the inspiration from here. Also to repeat what I wrote on top: as if this weren’t obvious by now!? I am still LEARNING and as such will arrive at different conclusions from time to time. See here for more on how I view this or better put how the author defines this!

Basically this adds in extra functionality to bash and keeps things organized at the same time!

# Functions: check for a separate function file, and if we find one
# source it.
if [[ -f ~/.bash_functions ]]; then
    . ~/.bash_functions
fi
massedit()
{
RED="\033[0;31m"
BLUE="\033[1;34m"
CYAN="\033[1;36m"
YELLOW="\033[1;33m"
NC="\033[0m"
if [ $USER = root ]; then
  echo -e $RED"   Are you Insane!"
  echo -e $CYAN"    Error: In order to use this script, one must NOT be $USER"
  echo -e $YELLOW"    Exiting..."$NC
  exit 0
else
  echo ""
  echo -e $BLUE"    $USER may proceed."
  echo -e $CYAN"    May peace be with you."$NC
fi
clear # Clear the screen.
  case "$1" in
  "HTML" | "Html" | "html")
  if [ ! "$(ls *.html 2>/dev/null)" ]
  then
  echo -e $YELLOW"$1 does not exist."$NC; echo
  else
  echo -e $BLUE"    You've chosen \"$(ls *.html | wc -l)\" file(s) to work on."$NC
  echo -e $RED"Executing..."$NC
  for i in *.html; do gzip -c --best $i > $i.gz;done
  for i in *.html; do \rm $i;done
  for i in *.html.gz; do ln -s -T "$i" $(basename "$i" .gz);done
  fi
  ;;
  "TXT" | "Txt" | "txt" | "TEXT" | "Text" | "text")
  if [ ! "$(ls *.txt 2>/dev/null)" ]
  then
  echo -e $YELLOW"$1 does not exist."$NC; echo
  else
  echo -e $BLUE"    You've chosen \"$(ls *.txt | wc -l)\" file(s) to work on."$NC
  echo -e $RED"Executing..."$NC
  for i in *.txt; do gzip -c --best $i > $i.gz;done
  for i in *.txt; do \rm $i;done
  for i in *.txt.gz; do ln -s -T "$i" $(basename "$i" .gz);done
  fi
  ;;
  "CSS" | "Css" | "css")
  if [ ! "$(ls *.css 2>/dev/null)" ]
  then
  echo -e $YELLOW"$1 does not exist."$NC; echo
  else
  echo -e $BLUE"    You've chosen \"$(ls *.css | wc -l)\" file(s) to work on."$NC
  echo -e $RED"Executing..."$NC
  for i in *.css; do gzip -c --best $i > $i.gz;done
  for i in *.css; do \rm $i;done
  for i in *.css.gz; do ln -s -T "$i" $(basename "$i" .gz);done
  fi
  ;;
  "XML" | "Xml" | "xml")
  if [ ! "$(ls *.xml 2>/dev/null)" ]
  then
  echo -e $YELLOW"$1 does not exist."$NC; echo
  else
  echo -e $BLUE"    You've chosen \"$(ls *.xml | wc -l)\" file(s) to work on."$NC
  echo -e $RED"Executing..."$NC
  for i in *.xml; do gzip -c --best $i > $i.gz;done
  for i in *.xml; do \rm $i;done
  for i in *.xml.gz; do ln -s -T "$i" $(basename "$i" .gz);done
  fi
  ;;
  "JS" | "Js" | "js")
  if [ ! "$(ls *.js 2>/dev/null)" ]
  then
  echo -e $YELLOW"$1 does not exist."$NC; echo
  else
  echo -e $BLUE"    You've chosen \"$(ls *.js | wc -l)\" file(s) to work on."$NC
  echo -e $RED"Executing..."$NC
  for i in *.js; do gzip -c --best $i > $i.gz;done
  for i in *.js; do \rm $i;done
  for i in *.js.gz; do ln -s -T "$i" $(basename "$i" .gz);done
  fi
  ;;
  *)
  echo
  echo "Please choose a valid option."
  echo "\"UPPER\", \"lower\" and \"Capitalized\" spelling are supported"
  echo -e $YELLOW"    Usage$NC: "
  echo -e $CYAN"    massedit$NC$BLUE html$NC"
  echo -e $CYAN"    massedit$NC$BLUE text$NC"
  echo -e $CYAN"    massedit$NC$BLUE js$NC"
  echo -e $CYAN"    massedit$NC$BLUE xml$NC"
  echo -e $CYAN"    massedit$NC$BLUE css$NC"
  echo  
  ;;
  esac
}

Yes I know it is still QUITE elaborate! But it serves it purpose. ;-) One simply cd’s into the folder of his her choosing and then one does massedit $arg (as in either html,text,js,xml or css but not all at the same time (yet))

This is also a cute function. For those times you only want to edit ONE file. ;-)
This should be seen as a companion to the tidyup.sh script I wrote about earlier.

## Or maketidy if you prefer.
mtidy()
{
## This is where I keep my tidy config file.
## Your path may differ.
config=$(echo "$HOME/.usr/bin/etc/config.txt")
tidy -config $config $1
}
  • Q: Why the linking of *.html to *.html.gz files?
  • LA:
    Because I use thttpd for serving up static content from within the comforts of my home and because I know thttpd supports sending content in gzip format (one simply appends the .gz extension.).
  • Furthermore:
  • I believe this will greatly supplement my poor upload speed, also because I don’t want to frustrate both search engines as well as the casual visitor(s). With the all to well known 404 not found error. Therefore I’ve decided to make things work “seamlessly”, without you knowing, aside from me telling you now. Does require a bit of tinkering with the config though might I add.
  • Q: Why not use cgi instead, I know thttpd supports that?
  • LA: Hmm… (.redirect comes to mind.) True, but a server running at home! Might need to be VERY secure and
    I am not willing to compromise security for convenience (yet).
    Hence a chroot’ed environment and no executables present withing the view
    of the program, better yet not even outbound access, might I add, for the user
    running the program.
  • SA: Because I choose too. :-D







PS: Constructive criticism/feedback will be (re)posted, all other(s) will be dropped HARD to the floor.

3 thoughts on “Mass editing text files.

  1. Though there perhaps may be a few good reasons (I haven’t found them yet!)? for wanting a script to run “unattended”.
    My common sense dictates to me that convenience and pleasure don’t always mix.

    For instance I want to know what my computer is doing as I run this or that (shell/any) script.
    Therefore for each and every script that I have written and will write, there will always be a need for user input!

    Semi automatic vs fully automatic, you decide! :-)

    Don’t like it? Write your own then!

  2. I absolutely adore reading your blog posts, the variety of writing is smashing.This blog as usual was educational, I have had to bookmark your site and subscribe to your feed in ifeed. Your theme looks lovely.

  3. Pingback: Why does one want to streamline and optimize files anyway? « Bohemian Wildebeest's Blog

Comments are closed.