Linux – Page 43 – Maurice’s Blog

April 20, 2013

Using wget to download all links

Wget to the rescue. It's a utility for unix/linux/etc. that goes and gets stuff from Web and FTP servers - kind of like a browser but without actually displaying what it downloads. And since it's one of those awesomely configurable command line programs, there is very little it can't do. So I run wget, give it the URLs to those mp3 blogs, and let it scrape all the new audio files it finds. Then I have it keep doing that on a daily basis, save everything into a big directory, and have a virtual radio station of hand-filtered new music. Neat.

Here's how I do it:

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt

And here's what this all means:

-r -H -l1 -np These options tell wget to download recursively. That means it goes to a URL, downloads the page there, then follows every link it finds. The -H tells the app to span domains, meaning it should follow links that point away from the blog. And the -l1 (a lowercase L with a numeral one) means to only go one level deep; that is, don't follow links on the linked site. In other words, these commands work together to ensure that you don't send wget off to download the entire Web - or at least as much as will fit on your hard drive. Rather, it will take each link from your list of blogs, and download it. The -np switch stands for "no parent", which instructs wget to never follow a link up to a parent directory.

We don't, however, want all the links - just those that point to audio files we haven't yet seen. Including -A.mp3 tells wget to only download files that end with the .mp3 extension. And -N turns on timestamping, which means wget won't download something with the same name unless it's newer.

To keep things clean, we'll add -nd, which makes the app save every thing it finds in one directory, rather than mirroring the directory structure of linked sites. And -erobots=off tells wget to ignore the standard robots.txt files. Normally, this would be a terrible idea, since we'd want to honor the wishes of the site owner. However, since we're only grabbing one file per site, we can safely skip these and keep our directory much cleaner. Also, along the lines of good net citizenship, we'll add the -w5 to wait 5 seconds between each request as to not pound the poor blogs.

Finally, -i ~/mp3blogs.txt is a little shortcut. Typically, I'd just add a URL to the command line with wget and start the downloading. But since I wanted to visit multiple mp3 blogs, I listed their addresses in a text file (one per line) and told wget to use that as the input.

April 16, 2013October 26, 2022

Secure Copy scp Syntax

Refer: https://phoenixnap.com/kb/linux-scp-command

Copy Files from Remote with Wildcard

scp x_ubuntu1804_ci:"/home/mruckman/crontab/*.zip" .

Single File

scp /your/source/file-to-copy.zip  xxx@target.server.com:/tmp/file-to-copy.zip

Single File Copied from Server

scp x_ubuntu1804_ci:/home/mruckman/sos-api-deployment-analysis/server_report.xls ~/Desktop/

Recursive Copy

scp -r user@server1:/var/www/html/ /var/www/ - it doubled created the target folder on RHEL server

scp -r user@server1:/var/www/html/ user@server2:/var/www/html/ - this has been untested

April 9, 2013April 9, 2013

Format XML with xmllint command

Linux / Unix Command: xmllint

xmllint - command line XML tool

Example:
$xmllint --format summary.xml > ~/Desktop/summary-format.xml

April 9, 2013

Grep file and include lines around search result

grep -A 1 -i "Search-for-something" /var/lib/jbossas/server/halprdjbs01/log/server.log

6.1 Display N lines after match

-A is the option which prints the specified N lines after the match as shown below.

Syntax:
grep -A <N> "string" FILENAME

The following example prints the matched line, along with the 3 lines after it.

$ grep -A 3 -i "example" demo_text

6.2 Display N lines before match

-B is the option which prints the specified N lines before the match.

Syntax:
grep -B <N> "string" FILENAME

When you had option to show the N lines after match, you have the -B option for the opposite.

$ grep -B 2 "single WORD" demo_text

6.3 Display N lines around match

-C is the option which prints the specified N lines before the match. In some occasion you might want the match to be appeared with the lines from both the side. This options shows N lines in both the side(before & after) of match.

$ grep -C 2 "Example" demo_text

April 8, 2013April 8, 2013

Grep but ignore subversion folders

Give this a go:

find . -not -iwholename '*.svn*' | egrep -i "(java|jsp)" | xargs grep -i "find-something" | more

April 7, 2013April 7, 2013

Remove Amazon ads from Ubuntu 12.10

Open a terminal and run the following command:

sudo apt-get remove unity-lens-shopping

March 14, 2013

Count words in a file

This command will do the trick

uniq -c file-to-count.txt

February 21, 2013February 21, 2013

Ubuntu Firewall Fix in Virtualbox

The following was done to allow Ubuntu 10.10 access to repositories through firewall, confirmed to work for Ubuntu 12.04

1. Update proxy configurations

1.1. System, Preferences, "Network Proxy", "Manual Proxy Configurations", "Use the same...", "Apply System Wide..."
1.2. System, Administration, "Synaptic Package Manager", "Settings", "Preferences", "Network", "HTTP Proxy:"

2. edit the following file: /etc/apt/apt.conf as sudo su
You should already see the proxy settings from previous steps; however, it is missing your credentials, these need added and the appliance needs restarted.

Format is as follows:
Acquire::http::proxy "http://user:pass@proxy.xxxx.xxxx:port";

3. You will need to restart before these changes start working

Note: Do not bother editing /etc/bash.bashrc it does not help anymore

October 23, 2012October 24, 2012

Linux search commands

Use the following two command to research program usage in Linux:

The following will show you the location of the program you run from the command/defined:
$which command-to-lookup

The following will show you where the basic places for the command happen to be:
$locate command-to-lookup

October 7, 2012January 11, 2013

Setup Subversion (SVN) Server in Ubuntu 12.04

1. Install Subversion
sudo apt-get install subversion libapache2-svn apache2

2. Where you want to keep repository?
sudo mkdir /svn
sudo gedit /etc/apache2/mods-enabled/dav_svn.conf

Delete everything and make it this instead:

<Location /svn> DAV svn SVNParentPath /svn AuthType Basic AuthName "Subversion Repository" AuthUserFile /etc/apache2/dav_svn.passwd Require valid-user </Location>

3. Create a user
sudo htpasswd -cm /etc/apache2/dav_svn.passwd username

4. Setup first repository
cd /svn
sudo svnadmin create test

5. Make sure you have all of the proper permissions for the repository
sudo chown -R www-data:www-data /svn

6. Restart Apache service
sudo /etc/init.d/apache2 restart

7. You are now ready to use TortoiseSVN for Windows or RapidSVN for Linux

8. Your repository URL is http://localhost/svn/test
Note: It looks like RapidSVN does not like spaces in its file names.

Please see PDF for original notes.