Parse
Upload and Parse Files
Section titled “Upload and Parse Files”Upload Files
Section titled “Upload Files”
- Find the Upload Button: Inside your task dashboard, click the prominent “Upload Files” or “Add PDFs” button.
- Select Your Files: A file selection window will open. You can hold down the
Ctrl(Windows) orCommand(Mac) key to select multiple PDF files at once for a batch upload. - Begin Uploading: After making your selection, click “Open”. The upload process will start automatically.
Once the upload begins, look to the “File List” menu on the left side of the interface. Here, you will see a list of all added files and their current status.
Here are the key statuses and what they mean:
-
Uploading: Appears immediately after clicking the upload button, indicating your file is in the process of being transferred to the system. Do not close the page or attempt to re-upload during this time.
-
Waiting: Displays once the upload is successful. This means the file is ready for processing but is waiting for you to select a language and parsing model before proceeding.
-
Parsing: Shows up after you select the file, choose your language and parsing model, and click the “Parse” button. It indicates the file is currently being processed by the system.
-
Wait Parse: Appears when multiple files are parsed simultaneously. The first file in the queue will show
Parsing, while the rest will displayWait Parseas they line up for processing. -
Success: Displays once parsing is completed without errors. You can now view the results, edit layouts, or take further actions with the processed file.
Parse Files
Section titled “Parse Files”After your files are successfully uploaded, you can configure the parsing settings to ensure you get the highest quality results.
1. Step 1: Select a File and Set Configuration
Section titled “1. Step 1: Select a File and Set Configuration”- Select a File: In the “File List” on the left, click on a file that has the
Waitingstatus. The selected file will be highlighted. - Open the Configuration Panel: With the file selected, look to the top-right corner of the interface to find the “Parse Configuration” panel.
- Set Your Parameters:
- PDF Main Language: From the dropdown menu, select the primary language of the document’s content (e.g., English, Chinese). A correct language selection greatly improves text recognition (OCR) accuracy.
- Table Parsing Mode: This is a critical setting that determines the quality of table extraction. We offer three modes:
| Mode | Description | Accuracy | Speed | Best For |
|---|---|---|---|---|
| fast | Uses a default, general-purpose algorithm for table detection. | ⭐⭐ | Fastest | Simple, well-structured tables with clear borders, or when you need a quick preview. |
| accurate | Employs a specialized model optimized specifically for tables. | ⭐⭐⭐⭐ | Fast | Recommended for most use cases, especially effective for tables with complex but standard formats. |
| multi-modal | Leverages an advanced GGM (Graph Generative Model) that understands visual and structural cues. | ⭐⭐⭐⭐⭐ | Slower | Extremely complex tables, such as those in scanned documents, borderless tables, tables with merged cells, or irregular layouts. (Note: This mode may consume more Credits) |
2. Step 2: Start and Monitor the Process
Section titled “2. Step 2: Start and Monitor the Process”- Start Parsing: Once you are satisfied with your configuration, click the “Start Parsing” button located in the bottom-right corner of the interface.
- Monitor the Status: You will see the file’s status update in the left-hand File List:
- It will first change to
Parsing, often indicated by a spinning icon, meaning the system is actively processing your file. - Upon completion, the status will change to
Success.
- It will first change to
3. Step 3: View the Results
Section titled “3. Step 3: View the Results”When the file status shows Success, you’re all set!
Simply click on the file entry. You will be taken to the results page, where you can view, copy, or export all the extracted text and table data.
Adjust and Correcting Results
Section titled “Adjust and Correcting Results”While our automated parsing engine is incredibly powerful, it may occasionally misinterpret highly complex document layouts. This feature gives you full control to fine-tune the parsing results and achieve 100% accuracy.
1. Understanding the Five Parsed Element Types
Section titled “1. Understanding the Five Parsed Element Types”After a successful parse, the content you see on the results page is composed of five basic element types:
- Text: A standard paragraph or block of text.
- Title: A heading or sub-heading within the document.
- Table: Extracted tabular data.
- Image: A picture or chart from the document.
- Formula: A mathematical or chemical formula.
2. How to Make Corrections: Two Common Scenarios
Section titled “2. How to Make Corrections: Two Common Scenarios”If you find that the parsing result isn’t perfect, it usually falls into one of the following two categories, both of which are easy to fix.
-
Scenario A: An Element’s Type is Incorrect
- The Problem: The system identified a title as plain Text, or incorrectly classified a block of text as a Table.
- How to Fix:
- Select the Element: On the result preview page, click on the misidentified element. It will become highlighted with a bounding box.
- Change the Type: A small toolbar or context menu will appear. From this menu, select the correct element type (e.g., change
TexttoTitle).
-
Scenario B: Some Content Was Missed (Not Detected)
- The Problem: A paragraph, a small table, or an image on the PDF page was completely missed by the automated detection.
- How to Fix:
- Draw a Bounding Box: Move your cursor to the start of the missed content on the PDF preview.
- Click and drag your mouse to draw a rectangle that precisely encloses the entire missed area.
- Assign a Type: When you release the mouse button, a menu will pop up, prompting you to assign an element type to this new selection (e.g.,
Text,Table).
3. Final Step: Re-Parse the Page
Section titled “3. Final Step: Re-Parse the Page”After performing either of the correction actions above (changing a type or drawing a new box):
- Configure Parameters: Ensure your new or modified element is still selected. Then, navigate to the “Parse Configuration” panel in the top-right corner. Re-select the appropriate parameters if needed (e.g., choose
accurateormulti-modalif you’ve just boxed a complex table). - Initiate Re-Parse: Click the “Parse This Page” button located below the configuration options.
- Wait for the Refresh: The system will now perform a new analysis on this page only, incorporating your manual instructions. Wait a moment, and the parsed content on the page will automatically refresh to show the corrected, perfect result!
Please Note: This is a page-level operation. It will only affect the page you are currently editing and will not alter the content of any other pages.