HARPA.AI
LIBRARYAPIGUIDESAI COMMANDSBLOG

📜  Parse URLs Page content

Parses the page content of URLs that have the {{gpt}} parameter. Use it in your commands with the “command” step.

Created by Morteza
Updated on Feb 17, 20:23
Installed 6 times
RUNS JS CODE

How to Use

IMPORT COMMAND

Content

- type: js
  args: gpt
  code: |-
    function isGptTooShort(args) {
      // Get gpt out of the arguments
      const gpt = args.gpt;
      
      // Check on undefined or null
      if (gpt === undefined || gpt === null) {
        return true;
      }
      
      // Convert to string and remove Whitespace
      const trimmedGpt = String(gpt).trim();
      
      // Check whether length less than 15
      return trimmedGpt.length < 15;
    }

    // Test the argument
    return isGptTooShort(args);
  param: boolean
  timeout: 15000
  onFailure: MESSAGE GPT PARAMETER IS EMPTY
  silent: true
  label: CHECK IF GPT STEP HAS LESS THAN 15 CHARACTERS
- type: jump
  to: MESSAGE GPT PARAMETER IS EMPTY
  condition: '{{boolean}} = true'
- steps:
    - message: ⏳ Give me a minute to search through the web for you...
      type: say
    - type: js
      args: gpt
      code: |-
        function extractAllLinksToArray(args) {
          const text = args.gpt;
          
          if (!text) return [];
          
          // Kombiniertes Regex Pattern für:
          // 1. Markdown Links: [text](url)
          // 2. Normale URLs: http(s)://example.com
          const urlPattern = /\[[^\]]+\]\(([^)]+)\)|https?:\/\/[^\s<>)"]+/g;
          
          // Array to store all URLs
          const links = [];
          
          // Find all matches
          let match;
          while ((match = urlPattern.exec(text)) !== null) {
            // If it's a markdown link (has capture group), use the URL from group
            // Otherwise use the full match (for plain URLs)
            const url = match[1] || match[0];
            
            // Avoid duplicates
            if (!links.includes(url)) {
              links.push(url);
            }
          }
          
          return links;
        }

        return extractAllLinksToArray(args);
      param: extractAllLinksToArray
      timeout: 15000
      onFailure: CREATING RELEVANT URLS
      silent: true
      label: EXTRACT ALL URLS INTO ARRAY
    - param: total
      format: number
      value: '{{extractAllLinksToArray.length}}'
      type: calc
      func: set
      label: SET LINKS LENGTH
    - steps:
        - label: TRYING TO FETCH
          param: content
          value: '{{page {{item}}}}'
          type: calc
          func: set
          format: ''
        - onFailure: SET PAGE URL
          type: js
          code: |-
            const content = args['content']; 
            return { 
             chars: content.length, 
             estimatedTokens: Math.ceil(content.length / 4), 
             estimatedWords: Math.ceil(content.length / 4 * 0.75) 
            };
          param: count
          timeout: 15000
          label: COUNT
          silent: true
          args: content
        - param: pageUrl
          value: '{{item}}'
          type: calc
          func: set
          format: text
          label: SET PAGE URL
        - code: |-
            const regex = /^(?:https?:\/\/)?(?:www\.)?([^\/]+)/;
            const testString = pageUrl;
            const matches = testString.match(regex);

            if (matches) {
              const hostname = matches[1];
              return hostname; 
            }
          param: hostname
          label: HOSTNAME
          type: js
          args: pageUrl
          timeout: 15000
          silent: true
        - func: increment
          param: index
          delta: 1
          type: calc
        - args: index, total
          code: |-
            function calculatePercentage(index, total) {
                index = Number(index);
                total = Number(total);

                if (isNaN(index) || isNaN(total)) {
                    return "Error";
                }

                return (index / total * 100).toFixed(1) + "%";
            }

            let result = calculatePercentage(index, total);
            return result;
          param: percentage
          label: PERCENTAGE INDEX TOTAL
          type: js
          timeout: 15000
          silent: true
        - steps:
            - type: js
              args: content
              code: |-
                function formatTextToInfoJson(args) {
                  const text = args.content;
                  
                  if (!text) return {};
                  
                  const cleanText = text
                    .replace(/"/g, '\\"')
                    .replace(/\r\n/g, '\\n')
                    .replace(/\n/g, '\\n')
                    .replace(/\t/g, '\\t');
                    
                  const jsonObject = {
                    info: cleanText
                  };
                  
                  return jsonObject;
                }

                // Speichere das Ergebnis in 'data'
                args.data = formatTextToInfoJson(args);
                return args.data;
              param: data
              timeout: 15000
              onFailure: ''
              silent: true
              label: CONVERT PAGE CONTENT TO JSON
            - func: extract-json
              index: first
              type: calc
              to: data
              param: data
            - param: data.url
              format: auto
              type: calc
              func: set
              value: '{{item}}'
            - func: list-add
              index: last
              list: array
              type: calc
              item: data
            - type: jump
              to: SAY STATUS
          condition: '{{content}} =~ ^[\s\S]{1000,}$'
          type: group
          label: FETCHED
        - steps:
            - args: item
              code: |-
                // Get URL from args parameter
                const url = args['item'];

                if (!url) {
                  return false;
                }

                const fullUrl = url.startsWith('http') ? url : 'https://' + url;

                window.location.href = fullUrl;
                return true;
              param: navigate.boolean
              onFailure: WAIT UNTIL PAGE LOADED
              label: NAVIGATE
              type: js
              timeout: 15000
              silent: true
            - type: wait
              for: custom-delay
              delay: '1500'
              silent: true
              label: WAIT UNTIL PAGE LOADED
            - type: js
              args: page
              code: |-
                function formatTextToInfoJson(args) {
                  const text = args.page;
                  
                  if (!text) return {};
                  
                  const cleanText = text
                    .replace(/"/g, '\\"')
                    .replace(/\r\n/g, '\\n')
                    .replace(/\n/g, '\\n')
                    .replace(/\t/g, '\\t');
                    
                  const jsonObject = {
                    info: cleanText
                  };
                  
                  return jsonObject;
                }

                // Speichere das Ergebnis in 'data'
                args.data = formatTextToInfoJson(args);
                return args.data;
              param: data
              timeout: 15000
              onFailure: ABORT NAVIGATE
              label: CONVERT PAGE CONTENT TO JSON
              silent: true
            - type: calc
              func: extract-json
              to: data
              param: data
              index: first
            - type: calc
              func: set
              param: data.url
              format: auto
              value: '{{item}}'
            - type: calc
              func: list-add
              index: last
              list: array
              item: data
          label: NOT FETCHED
          type: group
        - message: |-
            🔍 Analyzed **{{index}} / {{total}}**,  [{{hostname}}]({{pageUrl}})
            - Chars: {{count.chars}}
            - Estimated Tokens: {{count.estimatedTokens}}
            - Estimated Words: {{count.estimatedWords}}
          label: ⛔️SAY STATUS
          condition: '{{index}} = {{total1}}'
          type: say
        - message: |-
            🔍 Analyzed **{{percentage}}**,  [{{hostname}}]({{pageUrl}})
            - Chars: {{count.chars}}
            - Estimated Tokens: {{count.estimatedTokens}}
            - Estimated Words: {{count.estimatedWords}}
          condition: '{{index}} != {{total}}'
          type: say
          label: SAY STATUS
        - code: |-
            const content = args['array'];
            const stringContent = JSON.stringify(content);

            return {
              chars: stringContent.length,
              estimatedTokens: Math.ceil(stringContent.length / 4),
              estimatedWords: Math.ceil(stringContent.length / 4 * 0.75)
            };
          param: arrayLength
          label: ARRAY LENGTH
          type: js
          args: array
          timeout: 15000
          onFailure: SAY STATUS
          silent: true
        - message: |-
            ✅ **{{percentage}}** pages scanned. 

            Last checked page: [{{hostname}}]({{pageUrl}})
            - Chars: {{count.chars}}
            - Estimated Tokens: {{count.estimatedTokens}}
            - Estimated Words: {{count.estimatedWords}}
            - Array Token length: **{{arrayLength.estimatedTokens}}**
          condition: '{{index}} = {{total}}'
          type: say
          label: SAY STATUS
      type: loop
      list: extractAllLinksToArray
  label: DEPTH = 1
  type: group
- type: say
  message: '{{array}}'
  label: DISPLAY ALL PARSED URL PAGE CONTENT
- type: say
  message: |-
    Note: Use the following to parse only specific links:

    `{{array.0.info}}`
    `{{array.1.info}}`
    `{{array.2.info}}`

    Use `{{array}}` to display the entire result.
  interpolate: false
- type: say
  message: The {{gpt}} parameter is empty.
  label: MESSAGE GPT PARAMETER IS EMPTY
  interpolate: false
  condition: '{{boolean}} = true'
Notice: Please read before using

This automation command is created by a community member. HARPA AI team does not audit community commands.

Please review the command carefully and only install if you trust the creator.

Contact us
HomeUse CasesGuidesPrivacy PolicyTerms of Service
CAN WE STORE COOKIES?
Our website uses cookies for the purposes of accessibility and security. They also allow us to gather statistics in order to improve the website for you. More info: Privacy Policy