Rough notes on moving Tumblr blog to Blogger (including rehosting images and maintaining source links)

The following are some very rough notes on moving a blog from Tumblr to Blogger, including rehosting images and preserving source links. They are presented in a fashion for personal use, but may provide help to others. They make use of Bash and GNU tools. The generic procedures as documented in links below work perfectly fine, and should provide satisfactory results for most people.
how-to #1: http://www.analyticsforfun.com/2014/04/how-to-move-your-blog-from-tumblr-to.html
how-to #2: https://yourbusiness.azcentral.com/import-tumblr-blogger-10881.html
The procedures in those links leave the images hosted on Tumblr, and also strip the 'source' URLs to content from each post. The bash snippets included here successfully fix these issues, with only a few potential flaws that can be easily cleaned up manually. It would have been more proper to create the XML files from scratch, or at least manipulate the resultant XML objects directly. This was something contemplated during but was abandoned for simple Bash (sed et al.). Some alternate XML manipulation instructions are listed below

# N.B. Presented without modification - hard coding (i.e. 'oioiiooixiii' user ID) present

###### Step 1
Download tumblr account via https://tumblr2wordpress.benapps.net/
###### Step 2 - Get all image urls

tr \" \\n < blogger-export.xml \
| grep '.jpg\|.gif\|.png' \
| dedupe  > images.list

# And if needed, do further URL cleaning such as...
grep '.jpg\|.gif\|.png' < images.list \
| grep -v \{ \
| grep -v '\\' \
| grep -v \; > revised-images.list
###### Step 3 - Get all tumblr post urls

tr \> \\n < tumblr_oioiiooixiii.xml \
| grep oioiiooixiii.tumblr.com/post \
| cut -d\< -f 1 \
| awk '!x[$0]++' > tumblr-post.urls
###### Step 4 - Get source html from tumblr posts

html="$(wget -qO- http://oioiiooixiii.tumblr.com/post/149805370851)" #loop
# OR
wget -nc -w 1.5 --random-wait -i ../tumblr-post.urls
###### Step 6 - Get source URL from each tumblr post

urlDecode() # https://unix.stackexchange.com/a/187256 
{
   # urldecode  
   local firstPass=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$1")
   local urlEncoded="${firstPass//+/ }"
   printf '%b' "${urlEncoded//%/\\x}" 
}

getOriginalUrl() # html parser, prints URL string ('|||' is data separator)
{
   local userID="oioiiooixiii"
   
   for i in *
   do 
      raw="$(sed -e 's/<div class="cont content_source\">/\\n/g' \
                 -e 's/<\/a>//g' <<<"$(cat $i)" \
            | grep '<a href="http' \
            | grep -v '<a href="http://oioiiooixiii' \
            | cut -d\" -f 2)"
      
      # If URL is encoded, decode it twice (for double-encoded URLs)
      [[ $raw == *"%"* ]] \
      &&  printf "$i|||$(urlDecode "$(cut -d= -f2 <<<"$raw" \
                                  | cut -d\& -f1\
                                  | head -1)")\n"\
      || printf "$i|||$(head -1 <<<$raw)\n"
      #sleep .1
   done
}

# Clean up missing source links, by using Tumblr post URLs
while read line
do 
   postID=${line%|||*}
   url=${line##*|||}
   printf "${postID}|||"
 
   # If existing url doesnt conform to http(s) (e.g. blank)
   [[ "$url" =~ ^h  ]] \
   && printf "$url\n" \
   || printf "http://oioiiooixiii.tumblr.com/post/${postID}\n"
done < source-COPY.urls > source-FIXED.urls

###### Step 6 - Rehost images

#Upload images via Blogger post(s), or upload via Google Photos 
###### Step 7 - Get URLs of uploaded images

# Add all images to one, or multiple, blog posts (this can be tedious and laborious). Get the source html of these posts and execute the following
tr \" \\n < rehosted-images.urls \
| grep tumblr \
| grep 1600 \
| awk '!x[$0]++' > rehosted-images.urls
###### Step 8 - Replacing old image URLs with new image URLs

# tumblr archive XML: image urls replaced with corresponding Google URLs via sed
while read line
do
   original="$line"
   filename="${original##*/}"
   new="$(grep "$filename" <rehosted-images.urls)"
   
   echo "SWAPPING: $original FOR: $new" | tee -a image-swapping.log
   
   sed -i "s|$original|$new|g" tumblr_oioiiooixiii-NEW-IMG-URLS.xml
done < original-images.urls
###### Step 9 - Remove fluff and ready XML file for source link html

# Adding XML content would be best done using a XML editor like 'xmlstarlet'
# but because this strips '<![CDATA[' fear of issues arising during conversion
# kept editing at a low level with basic bash, sed, etc.

# Examples of adding data via 'xmlstarlet' 
# Find specific item based on 'wp:post_name' return post data formatted via Perl
xmlstarlet sel -t -m "/rss/channel/item[wp:post_name='65528357819']" \
-v 'content:encoded' test.xml | perl -MHTML::Entities -pe 'decode_entities($_);'
# Add data to object
xmlstarlet ed -u '/rss/channel/item/content:encoded' -v "newHTMLtext" test.xml

# Instead:
# Use sed to remove 'figure' tags, and ']]></content:encoded>'
# Idea being, source URL html will be added along with ']]></content:encoded>' 
# above postID in each 'item' object. Thus, re-encapsulating the html data.
sed -e 's/<div class="figure"><figure>//g' \
    -e 's/<\/figure><\/div>//g' \
    -e 's/]]><\/content:encoded>//g' \
    -e '/^\s*$/d' < tumblr_oioiiooixiii.xml
###### Step 10  Add source URLs to each post

# Read each line ('postID|||source_url') source url file and add to correct 'item'
while read line
do 
   postID=${line%|||*}
   url=${line##*|||}
   searchTerm='<wp:post_name>'"$postID"'</wp:post_name>'
   sourceHTML='</br><div  class="sourcelink">source: <a href="'\
              "$url"'">'"$url"'</a></div>'
   prefix="$sourceHTML"']]></content:encoded>'"$searchTerm"
   
   #echo "$prefix"; sleep 1
   sed -i "s|$searchTerm|$prefix|g" \
      tumblr_oioiiooixiii-NEW-IMAGES-REMOVED-LINES-ITEM-PREFIXED.xml

done < source-FIXED.urls

# Do a visiual test for any items that were missed
grep '<wp:post_name>' \
   < tumblr_oioiiooixiii-NEW-IMAGES-REMOVED-LINES-ITEM-PREFIXED.xml \
   > test-for-item-errors.text
###### Step 11 - Convert modified tumblr export XML file to Blogger version

# Upload  file to: http://www.wordpress-to-blogger-converter.appspot.com/
###### Step 12 - Import posts to Blogger blog

Go to 'Settings' tab in Blogger Dashboard, select 'Other', then 'Import Content'. Blogger will attempt to import all posts but may terminate after certain number. In testing, 900 posts imported upon first attempt, then only 1 or 2 in repeated attempts. Some posts may contain errors that cause the process to hang and so, will need to be fixed, but usually it is just a mandatory import limit enforced by Blogger, and will resolve itself after 24 hours.
Re-hosted tumblr account: https://↯.blogspot.com
/* Blogger Notes - 'Dynamic Views' theme with the following custom CSS (mainly remove annoying animations) */

*, *:before, *:after 
{
   transition-property: none !important;
   transform: none !important;
   animation: none !important;
}
#main 
{
   margin: 0px 0px !important;
   background-color:#3D3D3D !important;
}
.item { border: solid 0px #e3e3e3; }
.share-controls{ display: none !important; }